{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# How to train your own word vector embeddings with Keras"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Many tasks require embeddings or domain-specific vocabulary that pre-trained models based on a generic corpus may not represent well or at all. Standard word2vec models are not able to assign vectors to out-of-vocabulary words and instead use a default vector that reduces their predictive value.\n",
    "\n",
    "E.g., when working with industry-specific documents, the vocabulary or its usage may change over time as new technologies or products emerge. As a result, the embeddings need to evolve as well. In addition, corporate earnings releases use nuanced language not fully reflected in Glove vectors pre-trained on Wikipedia articles.\n",
    "\n",
    "We will illustrate the word2vec architecture using the keras library that we will introduce in more detail in the next chapter and the more performant gensim adaptation of the code provided by the word2vec authors. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To illustrate the word2vec network architecture, we use the Financial News data that we first introduced in chapter 14 on Topic Modeling. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "## Imports"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you have not yet install `TensorFlow 2`, uncomment and run one of the following, the first if you have a GPU and the second otherwise."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:15:50.255226Z",
     "start_time": "2020-06-21T17:15:50.253630Z"
    }
   },
   "outputs": [],
   "source": [
    "# !conda install -n ml4t tensorflow-gpu -y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:15:50.263802Z",
     "start_time": "2020-06-21T17:15:50.256450Z"
    }
   },
   "outputs": [],
   "source": [
    "#!conda install -n ml4t-text tensorflow -y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:15:51.546104Z",
     "start_time": "2020-06-21T17:15:50.265140Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "\n",
    "from pathlib import Path\n",
    "from collections import Counter\n",
    "\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "from scipy.spatial.distance import cdist\n",
    "\n",
    "import tensorflow as tf\n",
    "from tensorflow.keras.models import Model\n",
    "from tensorflow.keras.layers import Input, Dense, Reshape, Dot, Embedding\n",
    "from tensorflow.keras.preprocessing.sequence import skipgrams, make_sampling_table\n",
    "from tensorflow.keras.callbacks import Callback, TensorBoard\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:15:51.570940Z",
     "start_time": "2020-06-21T17:15:51.546981Z"
    }
   },
   "outputs": [],
   "source": [
    "gpu_devices = tf.config.experimental.list_physical_devices('GPU')\n",
    "if gpu_devices:\n",
    "    print('Using GPU')\n",
    "    tf.config.experimental.set_memory_growth(gpu_devices[0], True)\n",
    "else:\n",
    "    print('Using CPU')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": ""
    }
   },
   "source": [
    "### Settings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:15:51.581269Z",
     "start_time": "2020-06-21T17:15:51.571815Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "sns.set_style('white')\n",
    "np.random.seed(42)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Paths"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:15:51.592943Z",
     "start_time": "2020-06-21T17:15:51.582571Z"
    }
   },
   "outputs": [],
   "source": [
    "results_path = Path('results', 'financial_news')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:15:51.602250Z",
     "start_time": "2020-06-21T17:15:51.597005Z"
    }
   },
   "outputs": [],
   "source": [
    "analogy_path = Path('data', 'analogies-en.txt')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:15:51.611641Z",
     "start_time": "2020-06-21T17:15:51.604960Z"
    }
   },
   "outputs": [],
   "source": [
    "def format_time(t):\n",
    "    m, s = divmod(t, 60)\n",
    "    h, m = divmod(m, 60)\n",
    "    return f'{h:02.0f}:{m:02.0f}:{s:02.0f}'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## `word2vec` - skipgram Architecture using Keras"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Settings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:15:51.623147Z",
     "start_time": "2020-06-21T17:15:51.613943Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "LANGUAGE = 'en'\n",
    "SAMPLE_SIZE=.5              # portion of sentences to use for model\n",
    "NGRAMS = 3                  # Longest ngram in text\n",
    "MIN_FREQ = 10"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:15:51.633056Z",
     "start_time": "2020-06-21T17:15:51.624640Z"
    }
   },
   "outputs": [],
   "source": [
    "SAMPLING_FACTOR = 1e-4\n",
    "WINDOW_SIZE = 3\n",
    "EMBEDDING_SIZE = 300\n",
    "EPOCHS = 1\n",
    "BATCH_SIZE = 2500"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:15:51.642059Z",
     "start_time": "2020-06-21T17:15:51.634243Z"
    }
   },
   "outputs": [],
   "source": [
    "# Set up validation\n",
    "VALID_SET = 10      # Random set of words to get nearest neighbors for\n",
    "VALID_WINDOW = 150  # Most frequent words to draw validation set from\n",
    "NN = 10             # Number of nearest neighbors for evaluation\n",
    "\n",
    "valid_examples = np.random.choice(VALID_WINDOW, size=VALID_SET, replace=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:15:51.650684Z",
     "start_time": "2020-06-21T17:15:51.643065Z"
    }
   },
   "outputs": [],
   "source": [
    "FILE_NAME = f'articles_{NGRAMS}_grams.txt'\n",
    "file_path = results_path / FILE_NAME"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:15:51.660278Z",
     "start_time": "2020-06-21T17:15:51.651892Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "tb_path = results_path / 'tensorboard'\n",
    "if not tb_path.exists():\n",
    "    tb_path.mkdir(parents=True, exist_ok=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Build Data Set"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "#### Tokens to ID\n",
    "\n",
    "1. Extract the top *n* most common words to learn embeddings\n",
    "2. Index these *n* words with unique integers\n",
    "3. Create an `{index: word}` dictionary\n",
    "4. Replace the *n* words with their index, and a dummy value `UNK` elsewhere"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:15:53.539262Z",
     "start_time": "2020-06-21T17:15:51.661717Z"
    }
   },
   "outputs": [],
   "source": [
    "sentences = file_path.read_text().split('\\n')\n",
    "n = len(sentences)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:15:53.541860Z",
     "start_time": "2020-06-21T17:15:53.540192Z"
    }
   },
   "outputs": [],
   "source": [
    "max_length = 50"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:15:55.321239Z",
     "start_time": "2020-06-21T17:15:53.542603Z"
    }
   },
   "outputs": [],
   "source": [
    "sentences = [s for s in sentences if len(s.split()) <= max_length]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:15:55.325340Z",
     "start_time": "2020-06-21T17:15:55.323310Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Removed 59,438 sentences containing more than 50 tokens\n"
     ]
    }
   ],
   "source": [
    "print(f'Removed {n-len(sentences):,.0f} sentences containing more than {max_length} tokens')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:16:03.769677Z",
     "start_time": "2020-06-21T17:15:55.326609Z"
    }
   },
   "outputs": [],
   "source": [
    "words = ' '.join(np.random.choice(sentences, size=int(.5*len(sentences)), replace=False)).split()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:16:08.058097Z",
     "start_time": "2020-06-21T17:16:03.770639Z"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "# Get (token, count) tuples for tokens meeting MIN_FREQ\n",
    "token_counts = [t for t in Counter(words).most_common() if t[1] >= MIN_FREQ]\n",
    "tokens, counts = list(zip(*token_counts))\n",
    "\n",
    "# create id-token dicts & reverse dicts\n",
    "id_to_token = pd.Series(tokens, index=range(1, len(tokens) + 1)).to_dict()\n",
    "id_to_token.update({0: 'UNK'})\n",
    "token_to_id = {t: i for i, t in id_to_token.items()}\n",
    "data = [token_to_id.get(word, 0) for word in words]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:16:08.060825Z",
     "start_time": "2020-06-21T17:16:08.059008Z"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "vocab_size = len(token_to_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:16:08.076494Z",
     "start_time": "2020-06-21T17:16:08.061702Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "60545"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vocab_size"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:16:11.422885Z",
     "start_time": "2020-06-21T17:16:08.077417Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "s = pd.Series(data).value_counts().reset_index()\n",
    "s.columns = ['id', 'count']\n",
    "s['token'] = s.id.map(id_to_token)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:16:11.435948Z",
     "start_time": "2020-06-21T17:16:11.424130Z"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>count</th>\n",
       "      <th>token</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>422022</td>\n",
       "      <td>UNK</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>173807</td>\n",
       "      <td>company</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>171384</td>\n",
       "      <td>million</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>150168</td>\n",
       "      <td>said</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4</td>\n",
       "      <td>113295</td>\n",
       "      <td>year</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>5</td>\n",
       "      <td>108431</td>\n",
       "      <td>quarter</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>6</td>\n",
       "      <td>83870</td>\n",
       "      <td>financial</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>7</td>\n",
       "      <td>79069</td>\n",
       "      <td>reuters</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>8</td>\n",
       "      <td>75674</td>\n",
       "      <td>percent</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>9</td>\n",
       "      <td>75324</td>\n",
       "      <td>net</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   id   count      token\n",
       "0   0  422022        UNK\n",
       "1   1  173807    company\n",
       "2   2  171384    million\n",
       "3   3  150168       said\n",
       "4   4  113295       year\n",
       "5   5  108431    quarter\n",
       "6   6   83870  financial\n",
       "7   7   79069    reuters\n",
       "8   8   75674    percent\n",
       "9   9   75324        net"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.sort_values('count', ascending=False).head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:16:11.447704Z",
     "start_time": "2020-06-21T17:16:11.437317Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "17985158"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s['count'].sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:16:11.521601Z",
     "start_time": "2020-06-21T17:16:11.448612Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "s.sort_values('id').token.dropna().to_csv(tb_path / 'meta.tsv', index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Analogies to ID"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:16:11.525117Z",
     "start_time": "2020-06-21T17:16:11.522523Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "def get_analogies():\n",
    "    df = pd.read_csv(analogy_path, header=None, squeeze=True)\n",
    "    categories = df[df.str.startswith(':')]\n",
    "    analogies = df[~df.str.startswith(':')].str.split(expand=True)\n",
    "    analogies.columns = list('abcd')\n",
    "    return analogies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:16:11.595667Z",
     "start_time": "2020-06-21T17:16:11.526297Z"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>athens</td>\n",
       "      <td>greece</td>\n",
       "      <td>baghdad</td>\n",
       "      <td>iraq</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>athens</td>\n",
       "      <td>greece</td>\n",
       "      <td>bangkok</td>\n",
       "      <td>thailand</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>athens</td>\n",
       "      <td>greece</td>\n",
       "      <td>beijing</td>\n",
       "      <td>china</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>athens</td>\n",
       "      <td>greece</td>\n",
       "      <td>berlin</td>\n",
       "      <td>germany</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>athens</td>\n",
       "      <td>greece</td>\n",
       "      <td>bern</td>\n",
       "      <td>switzerland</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        a       b        c            d\n",
       "1  athens  greece  baghdad         iraq\n",
       "2  athens  greece  bangkok     thailand\n",
       "3  athens  greece  beijing        china\n",
       "4  athens  greece   berlin      germany\n",
       "5  athens  greece     bern  switzerland"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "analogies = get_analogies()\n",
    "analogies.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:16:13.456342Z",
     "start_time": "2020-06-21T17:16:11.596507Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.7297574039067423"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "analogies_id = analogies.apply(lambda x: x.map(token_to_id))\n",
    "analogies_id.notnull().all(1).sum()/len(analogies_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Generate Sampling Probabilities\n",
    "\n",
    "There is an alternative, faster scheme than the traditional SoftMax loss function called [Noise Contrastive Estimation (NCE)](http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf).\n",
    "\n",
    "Instead of getting the softmax probability for all possible context words, randomly sample 2-20 possible context words and evaluate the probability only for these."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "**SAMPLING_FACTOR**: used for generating the `sampling_table` argument for `skipgrams`. \n",
    "\n",
    "`sampling_table[i]` is the probability of sampling the word i-th most common word in a dataset\n",
    "\n",
    "The sampling probabilities are generated according\n",
    "to the sampling distribution used in word2vec:\n",
    "\n",
    "$p(\\text{word}) = \\min\\left(1, \\frac{\\sqrt{\\frac{\\text{word frequency}}{\\text{sampling factor}}}}{\\frac{\\text{word frequency}}{\\text{sampling factor}}}\\right)$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:16:14.101456Z",
     "start_time": "2020-06-21T17:16:13.457164Z"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYMAAAD3CAYAAAD/oDhxAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAgAElEQVR4nO3deVxVdf7H8de5+4ULiIIbiDtqbohmWi7VZFRTZlpJi/2mbVpmaZqazDatyKypZsqaxiybyaaS1Eory8zGXRtRMFxzCQU3QEQuXO52zu+PCyi5IV04l8vn+Xj48J5zz/mezxfxvu/ZvkfRNE1DCCFEs2bQuwAhhBD6kzAQQgghYSCEEELCQAghBBIGQgghAJOeG7/gggtISEjQswQhhGhyCgoKWLduXVDb1DUMEhISmD9/vp4lCCFEkzN27NigtymHiYQQQkgYCCGEkDAQQgiBzucMhBDiVLxeL/n5+VRWVupdiq5sNhuJiYmYzeYG35aEgRAi5OTn5xMVFUWnTp1QFEXvcnShaRrFxcXk5+fTuXPnBt+eHCYSQoScyspKWrVq1WyDAEBRFFq1atVoe0cSBkKIkNScg6BaY/4MdA2DcrdPz80LIYSoomsY5Je49Ny8EEKcUU5ODhMmTDhp/tKlSxk3bhzjx48nMzNTh8qCT9cTyPJUHSFEqJo5cyYLFizAbrfXmu/1enn++eeZO3cudrudm266iUsuuYT4+HidKg0OuZpICBHS5mXlk7l+X1DbvHFQB8YNTDzjMklJSUyfPp1HHnmk1vxdu3aRlJRETEwMAAMHDmT9+vVceeWVQa2xsckJZCGEOIW0tDRMppO/LzudTqKiomqmIyMjcTqdjVlag5DDREKIkDZuYOJZv8U3JofDQXl5ec10eXl5rXBoqvTdM5A0EEI0MV27diUvL4+jR4/i8XhYv349AwYM0LusX0zncwaSBkKIpmHhwoVUVFQwfvx4Hn30Ue688040TWPcuHG0adNG7/J+MTmBLIQQp5GYmFhz6eg111xTM//SSy/l0ksv1ausBiEnkIUQQkgYCCGE0DkM5IyBEEKEBtkzEEIIIWEghBBCwkAIIQR6h4GcNBBChCBVVXnqqacYP348EyZMIC8vr9b7Zxq19HQjnYY6uc9ACCF+ZsmSJXg8HubMmUN2djbTpk3jzTffBM48aunpRjptCmRsIiFEaMv+EDa+H9w2B9wKKTed9u2srCyGDx8OQEpKCrm5uTXvnWnU0tONdNoUBPUw0datW7nlllt49NFHWbt2bTCbFkKIRuN0OnE4HDXTRqMRn89X897pRi093UinTUFQq960aRNxcXEYDAa6d+8ezKaFEM1Vyk1n/BbfEH4+MqmqqjUf8jJqaR0MHDiQjIwM7r77bt55551gNi2EEI0mNTWV5cuXA5CdnU1ycnLNezJqaR1s3bqV+Ph4YmJi8Pv9wWxaCCEazahRo1i1ahXp6elomsbUqVNl1NJqOTk5vPTSS8yePRtVVZkyZQrbt2/HYrGQkZFBx44dSUhI4Nlnn8VsNvO73/3urG3KCWQhRCgyGAw888wzteZ17dq15vWZRi09caTTpqROYfDzy6VOd9lVamoqqampDVqwEEKI4KvTOYPqy6Wqnemyq3Mj+wZCCBEK6hQGP79c6kyXXZ0TyQIhhAgJ9bqa6EyXXQkhhGh66hUGZ7rs6lzIjoEQQoSGen2dP9VlV0IIIZquOofBiZdLneqyKyGECBenu3y+2tKlS3njjTcwmUyMGzeOG2+88bTr5OXl8eijj6IoCt27d2fy5MkYDIGDMkeOHCE9PZ2FCxditVr16i6g9xDWgKbJwSIhRGg58fL5hx56iGnTptW8Vz1q6axZs5g9ezZz5syhsLDwtOs8//zz/OlPf+KDDz5A0zS+/fZbAFasWMEdd9xBUVGRLn38Od3P+moaKIreVQghQtWCXQv45MdPgtrmdd2vY3TX0ad9vz6jlmZnZ59ync2bNzN48GAARowYwapVqxg1ahQGg4F3332XcePGBbVv9aV/GOhdgBBC/MzpLp83mUynHbX0dOtomoZS9Y03MjKSsrIyAC666KJG6k3d6BoGEbjxqxpGg+waCCFObXTX0Wf8Ft8Q6jNq6enWqT4/UL1sdHR0I/Tg3Ol6zqCrYT8+n1fPEoQQ4iT1GbX0dOucd955rFu3DoDly5czaNCgRu5N3eh+mMhbWQE2fc+iCyHEieozaunpLrmfOHEiTz75JK+88gpdunQhLS1N596dmqLpeDnP2KFdmPnZClq1TtCrBCFECNq6dSu9evXSu4yQcKqfxdixY5k/f35Qt6P7paU+d4XeJQghRLOnexioHpfeJQghRLOnexh4Zc9ACCF0p3sYyGEiIYTQn/5hUClhIIQQetM/DGTPQAghdKd7GPg9EgZCiNCUk5PDhAkT6rx8dnY2N9xwA+np6bz++us18+fPn88NN9zA2LFjeeONNxqi1F9M95vOZM9ACBGKZs6cyYIFC7Db7XVeZ/LkyUyfPp0OHTrw29/+ls2bNxMVFcWHH37I7NmzsVgsvPbaa3i9XsxmcwNWf+50DwNvZfnZFxJCNFtHP/2U0nnBvcEqZtxYWowZc8ZlkpKSmD59Oo888ggA27dvJyMjA4AWLVowderUWgPWOZ1OPB4PSUlJAAwbNow1a9bgcDjo06cPEydOpLCwkHvvvTfkggBC4DCRz1WmdwlCCHGStLS0Ws92f/LJJ5k8eTKzZ89mxIgRvP3227WW//mopdUjlJaUlLB+/Xqee+45pk+fTkZGBseOHWu0ftSV7nsGfrdT7xKEECGsxZgxZ/0W3xh27drF008/DQQecNO5c2fef/99vv76awCmTZt20mim0dHRREREMHjwYBwOBw6Hg65du/LTTz/Rr18/XfpxOrqHgeKWPQMhROjr3LkzL7zwAu3btycrK4vCwkKuuOIKbr311pplzGYze/fupUOHDqxcuZLf//732Gw2PvjgA9xuN36/v+bhOKFG9zAwu0v0LkEIIc5qypQpTJw4Eb/fD8Bzzz130jJPP/00Dz/8MH6/n2HDhtG/f38Axo0bx0033YSmadx///20aNGiUWuvC11HLf310F48P7o9/SZ9q1cJQogQJKOWHtcsRi1VFRN2n+wZCCGE3oIeBsXFxYwdO7ZOy2oGIzF+CQMhhNBbUMNA0zTefvttEhLq9rAa1WCmpXYUze8LZhlCCCHOUVDD4MMPP+Saa67Baq3jYywNZkyKStmRA8EsQwghxDkKahisXr2ajz76iB9++IFFixadfeMmCwAlh/YGswwhhBDnqM5hcOKATaqq8tRTTzF+/HgmTJhAXl4eAK+//jrPPPMMffv25corrzz7xk2BW7KPHZYwEEIIPdUpDGbOnMkTTzyB2+0GYMmSJXg8HubMmcNDDz3EtGnTai3/0ksv1WnjJnPgcFJ54b5zqVkIIRpFsEYtff7557n++uu58cYbycrKaohSf7E6hUH1gE3VsrKyGD58OAApKSnk5ubWa+Mms4VyzQZF2+u1vhBCNJSffwmui8mTJ/Pyyy/z4YcfkpOTw+bNm9m2bRsbN27k448/5sUXXzzlzWqhoE53IKelpZGfn18z/fMBmYxGIz6fr9agTnVVYE4isnTnOa8nhGgetq09wNZVwb3IpNdF7eg5pN0ZlwnWqKVjx47FZrPh8XhwOp31+pxsDPU6gexwOGoNyKSqar07eCyqK+08e+q1rhBCNJRgjVpqMpkwGAxceeWV3H777dxxxx2N1odzUa9P8NTUVL777juuuuoqsrOzSU5OrncB3lY9iStZxNHC/bSIb1/vdoQQ4annkLN/i28M9R219NNPPyUuLo533nmH8vJybr75ZgYMGECbNm106cfp1CsMRo0axapVq0hPT0fTNKZOnVrvAiKSBsBOKNj2PS3i9R+mVgghTqW+o5bu2bOHiIgIjEYjkZGRWCyWWqERKuocBomJiWRmZgJgMBh45plnglJAUq/BsBRK92yE4RIGQojQVN9RS/v06cOGDRtIT0/H7/dzzTXX0KVLl8Yu/6x0P5MRG9+OA0o8tkMb9C5FCCFqOfFLcJ8+fZg9e/YZl09JSalZvprRaAzal+eGpPtjLwEKIvvSvnyL3mUIIUSzFRJh4EsYTFuK2L9rs96lCCFEs6RrGGwp3sLGwxuJH3AVAIezv9CzHCFECNHxuVshozF/BrrvGdy26DY6duvDPq01lp/+q3c5QogQYLPZKC4ubtaBoGkaxcXF2Gy2Rtme7ieQAQxGhd0xFzDo2DdoPjeKqY5DYAshwlJiYiL5+fkUFhbqXYqubDYbiYmJjbKtkAiD/u/15x/dHyIyayH5m/5LYmqa3iUJIXRkNpvp3Lmz3mU0K7ofJqpm6HUeXs3IkU1f6V2KEEI0O7qHway0WQDcu+pONpt64MhfpnNFQgjR/OgeBue3PZ9YaywAT3ew0MW3i/xdcs+BEEI0Jt3DAGDJDUsA2KkcoVAxkv/dTJ0rEkKI5iUkwsBitPBA6gMAjOqYQFL+Z6g+n85VCSFE8xESYQBwV9+7GNd9HH4F3ojT2LJyvt4lCSFEsxEyYQAweehk+rXqy4IoB99vfE3vcoQQotkIqTBQFIV3rphFRy2Gf7YoYfkGGZ5CCCEaQ0iFAYDNZOPvv3oDgP+sf1nnaoQQonkIuTAA6NahP4N8LcgxH2JnwU96lyOEEGEvJMMA4MaU+yg3GHjvq0l6lyKEEGEvZMNg5KBb6OkxsZJNHCpp3oNVCSFEQwvZMFAUhZu73UqhycDMzx/TuxwhhAhruoZB55gzj0o4ZuSDdPMo/Ne9mmPOo41UlRBCND+6hoHdZD/j+4rBQHqn/+OQ2cDf5t/fSFUJIUTzE7KHiaqNH/UQ/d0WvvLlkHdwr97lCCFEWApqGOTm5vLQQw8xceJEioqKgtbuXSkP4TQa+NuCe4LWphBCiOOCGgZut5vJkyczcuRIsrOzg9buxYNvZoQ3lmWWfXy9dl7Q2hVCCBEQ1DAYOHAgO3fuZNasWfTq1SuYTTPp6reJVDX+uelpPB53UNsWQojmLqhhsGnTJvr06cPMmTN5//33g9k0iW2TSY+5gp1WjYwPJwS1bSGEaO7qHAY5OTlMmBD4EFZVlaeeeorx48czYcIE8vLyACgvL+exxx4jIyODtLTgP9T+/jEvMrDSzpfqFpZ+Pzfo7QshRHNlqstCM2fOZMGCBdjtgUtBlyxZgsfjYc6cOWRnZzNt2jTefPNNhg4dytChQxusWIPRyMO/ms7vl93Bcz9MpnNCHzon9Gyw7QkhRHNRpz2DpKQkpk+fXjOdlZXF8OHDAUhJSSE3N7dhqjuFPt0uYGLPRzliVMj44nZUv7/Rti2EEOGqTmGQlpaGyXR8J8LpdOJwOGqmjUYjvkZ8TOWVF03g1/Tge6uTjA9ua7TtCiFEuKrXCWSHw0F5eXnNtKqqtcKiMUy59SMGVtqZ58/h318816jbFkKIcFOvMEhNTWX58uUAZGdnk5ycHNSi6sJkMjNt3Dw6eRVeO/whsxY+0+g1CCFEuKhXGIwaNQqLxUJ6ejrPP/88kybp88yBtnEdeO3X80j0KbxWnMkHX/9VlzqEEKKpUzRN0/Ta+NixY5k/f/4vbmffwd3ct/BajhhVnu8/lZEDrw1CdUIIEZqC9dl5opAfqK4uOrTtwpQLp2PW4LGcx1iz6Su9SxJCiCYlLMIAYFDvi/nr4FcxAI/97yFWbFigd0lCCNFk6BoG/iNH0IJ4SergvpfxdJ+n8QOTsifxxcp/Ba1tIYQIZ7qGgffAAfY/8khQ27x08PVMTX0BqwbP/PhXPlr8SlDbF0KIcKT7YaJjXy4KepvDBlzNqyNn0cqnMG3/LN78RJ+rnYQQoqnQPQwAvIcOB73NPt0u4O0xX9DVY+TN0oU89/5vZOgKIYQ4jZAIg50jRzZIu+3jO/LPGxbRz23lI38Wv3l7CPsO7m6QbQkhRFMWEmEAcGDKlAZpNz62Pf+6cy03GPrxg9XFXZ+P5vsfljTItoQQoqnSPQx6ZG8E4OhHc/BUPRch2EwmM09N+A+PJd6L06Dx8PcP8PGS1xtkW0II0RTpHgYGm412z2UAsCvtigbd1g2X/Z5pKc9j1RQy8v/JU/++EZ/P26DbFEKIpkD3MABoMW5czeuypUsbdFvDU0cze8wiBrodfMJWbp01mOztKxt0m0IIEepCIgwAuq9cAUD+/b+jcsuWBt1W27gOvH3XKm4xD2aP2cP9q+5h1sKnG3SbQggRykImDExxcXT6OBOAvb+9B19JSYNuz2A08ujN7/DqoL/R0m/gb0fm8vuZl1BYsr9BtyuEEKEoZMIAwN63LwmvvYr/6FF+vPAivAcONPg2h/S9nPfTv+NSTxuWWYq4Zd7lcnJZCNHshFQYAERffjmJr70GmsaBJ56kMUbYbhEVx6t3L2FS29vwKBrPFMzgjhlD2LBtRYNvWwghQkHIhQFA1KWXEHvzTZSvWsX+iRMbbbs3p/2FOWMWc7k3gRyLk7vW3sekWaMpKS1stBqEEEIPIRkGAG0efxxb794cW7CQkjmZjbfdVgm8fNdXzBj8Ov3dkXxu3MO18y7mxQ9/S0Vl+dkbEEKIJihkw0AxGun43r8xxMRw8OmnqdiwoVG3P6j3xbx7zzomtf0NcT4jsz1ruO79C3j382dljCMhRNgJ2TAAMERG0vXzhViSkth33/24f/yx0Wu4Oe0h5t65kfuirsSPxivFmVz/zgDe+uxxuWFNCBE29A0DRTnrIqb4eDq8NQNFUdh9zWjK165thMJqMxiN3D/2RT65aRXpxlRKjX6mH13Ate+m8sqc31HqPNLoNQkhRDDpGga2886r03KWpCQ6zfkIc/v27P3N7RydN6+BKzu1qMgWPH7rv1l4y/fcbhuBqmi8W7mcqzOH88S749h3oPH3XIQQIhhC+jDRiSwdO9Ipcw7mxEQOPP4E+X98ANXj0aWWCFskfx7/Bgt/s5E/t7qRBJ+Fzww7GPfVGB6YeRlrf1isS11CCFFfQQ2DNWvW8Mgjj/DHP/6Rbdu2BbNpIHCXcpfPPiXm2tGULV7MvrvuxltQEPTt1Lkek5nbr36Sj367kRe6/oU+XgffmQ9y94aHuOGtFF77+E94PG7d6hNCiLoKahi4XC5eeOEF7r33XlaubJjB3wyRkbR/4QXaTpmCKzubXdeMpmROJpqqNsj26uqqYbcx67freP/Ct0jzJbLH7GNmxbcM/08qf347TZ6hIIQIaUENg0svvRSXy8Xs2bO57rrrgtn0SWLTx9Pxww8wt23LwcmT2T7ofFw5OQ26zbrol3whL925iO9uXMkE8xDa+gx8Y97PnRseZPTMvjz93k3sKQj+XpMQQvwSQQ2DkpISnnvuOf74xz/SqlWrYDZ9Svbevemy4DPav/gCBrOZn8anczDjOd3OJZwoKrIFj9w8k8/u/oFXejzBJZ54iowqc7VcRi+5gZveSuXlj+6TK5GEECGhzmGQk5PDhAkTAFBVlaeeeorx48czYcIE8qqeUPb8889z6NAhXn75Zb766quGqfhnFJOJmNGj6Tx/HpEjR1Dy/vvsvPRXOJcta5Tt18WoIeN57e6lrPy/Tfwl/ibOr4wk1+rlX+6VDJs3khvfSuGZ2beQt3+H3qUKIZopRavDSHAzZ85kwYIF2O12MjMzWbx4MUuXLmXatGlkZ2czY8YM3nzzzXPe+NixY5k/f369Cj+do/PmU/jqq/gOH8Y+aCDtJk/G2r17ULcRDJXuCt5a8Bgbjqxmk7UCb9U9F93dCim2PvzfZVPo2D5Z3yKFECGpIT4767RnkJSUxPTp02ums7KyGD58OAApKSnk5uYGtahfosW4sXRZuID4P/8Z99Zt7B5zHQefzUCtqNC7tFps1gj+eMPf+dc937N+Qg4Pxd3IBZUO9lhUPtZ+4OpvxjH2rX5MmjWaldlf6l2uECLM1SkM0tLSMJlMNdNOpxOHw1EzbTQa8fl8wa+unowxMcT99m66fv0VUZdeSsl//sP21IHsn/RYozwj4VwZjEZ+8+snefueNfzv1sC9C+dXRnLQ5Odz4x7uy5nIZW/35vczL+a9L6dSXlGmd8lCiDBjOvsiJ3M4HJSXHx/BU1XVWmERKkxxcSROfw3nsmUcnTuP0oULKf3kE1rdfTfxf/g9isWid4knqb534XaeRPX7WfJ9Jl/kvs0PhoMsMxezrPBD/vrxh/RwG+hi6sClPW/i8gvSMRiNepcuhGjC6nU1UWpqKsuXLwcgOzub5OTQPrbtGDmSxOmv0enDD7D17k3xzJlsSx1I/h/+EHKHj05kMBq5fOhNvHr3tyy9czNzR7zHzaaBDKy0s8/sY5Exj7/8OI0h7/VjwoxBvPzRfWzdnaV32UKIJqheX+dHjRrFqlWrSE9PR9M0pk6dGuy6GoS9b186fZyJc+lS9k96jLJvlrD9m4FEX3UVbZ54HFPLlnqXeEY9Og9gUud/AaD6/azI/pyvN/2LLd5dZNvcZLtX8q8VK4lYptLVY6WnvSdXD7qH1PNG6lu4ECLk1elqoobSEGfE60rTNMpXrKBkTibOb78FIPLCobR95lksiQm61PRLlJUfZebnT7CjNIedhiMcMh/f6TNqGj09ZrpZuvCr3jdzUf+rsVisOlYrhPglGuKzs9mGwYnKln7Hgccfx19SUjOvzWOPEXvrLSiGJjOWXy2HiguYt+xVcovWsctQzH5z7eHCu7sV+lh78Ks+tzK07xUSDkI0IRIGDax83fccyngW9487a83vMHMmjuHDdKoqOCoqy8n89hVyDqxgp3aAn3527ry7W6GzoT0jul/PNcNulxPSQoQwCYNG4ispoWT2+xT94x+15ps7dCDxjdexhfgJ87ooPnqQT5b/gx8OrWavdpid1tq/Bh08Gq3VCHpHD6B/p4sZ2OsSWrVoq1O1QogThV0YDB80is8XfUZMfIReJZyVKyeHopkzcS75tvYbZjMdXp+OY2T4nJxduHwWy3Z8zG61gIMmP2XG2ofIOnsgXnPQ3XEeA7tczvAB12Czhu6/nRDhKizDIH3QJH73z0v1KuGcOJct4/Df/o77FM9qiLl+HG0nTcIQGalDZQ1j595cVv3wGZsPruag7zA/WF34fvao0nifSj+tPQ5jNOd3TiO152V0aNtFp4qFaB7CNgzuf/MSlDo8DzmUVKxfz5F/v4dz2TK0n42SauvTh9YPP0zkkAt0qq7h5B/+icXr3iP74H85oB1hr9lLxc9Ossf7VNr6rLQztqZH/Pn06nA+F/X/tZyHECJIwjYMgCazd3AqfqeTkg8+pPTTT/Hs3n3S+6bWrUl45WUiBg3SobqGV1JayKK177HlwGqKPYfJNh3BpyhUGo4HvFXVaONTaK9FkWhLYlCXKziv8xDaxSXJoSYhzlFYh8GdLw3H5jDrVUrQaJqGa/16Ch7+C75Dh069kMFA/AMP0PLWW8LqsNLPLcv6jO3537PmYOApb+ttp77bO7XShqbAxe2uxmQ0M6T3VSR3TGnMUoVoUsIyDN585V8s+2A7QJM8XHQ2ms9H0T9nUPz222iVladdztSuHS1vvYUW48djPGEQwHDi8bgpqyjhu6x5/Hjwf+xwbqXIUEGhSaX8Z4ea2nhVVAV6+lvRxpaAyWDhzisyaBHVCpPRjMnU9L84CFFfYRkGK9Z/wxv3LgUg5bIOXHR96D17INgq1q+n6J8zKD/Lc6KjrryCyAuGEH311Rgd4bsHofr9LF0/H5e7jA0/fcOByn148fO91XnadQZW2mlhiEZRDNw4+M9ER8ZhNdvoltSnESsXQh9hGwaVTi/vPLwCgLv/PgKLLfRGQG1Imqbh2riRI+++S9k3S864bMTQIdhTUoi6+GJsvXqF5MirwVJRWU65qxSAdxdN5pj7COX+MlYZCzBpnHTpK0CiVyNJbQFArxapXNT7WgDsVgd9uoXfCX3RPIVtGAAseHUj+7aW0L57C657KFWvkkKGWlFBxYaNlC74jGMLP4cz/DMpdju23ucRc/XVRA4ZgjkpqckOo3Gu3vrsSUorDqNpKl+7VqNU/ZhOHJupWgePRgctEBQxxhhuv+QZAAyKge5J/eRqJ9FkhHUYaJrGwuk57NtyhAGXJ3Hh2G56lRWyNFXFm5+P87//xZWzibKlS9FcrtOvYDYTkZqKrU9v7CkpRAwYgDE2FqUZfOit3Pg5WTsDv1t+1cvnzmUYzhAUsT6VgdrxAQpHdh3HmEvuaZRahThXYR0GAH6fyn+eWkvZkUquvKcvXQbE61Vak6JpGpW5uZSvWo17xw4q1q/Hd/jwGdcxd0wicuhQ7P1TiEgdgKltWwzW5jFY3cGifXzw7TR8qheAb8tXc9R4/L9B9X0Thqr/GmrVRQ3pxoGYjccPy/XucCG/HvabRqpaiOPCPgwAPJU+FryazaE9x/jV//Wi59B2OlUXHnwlJVTm5lLx/fdUbt6Me9fu01/yCpiTkrCn9MfWowfmhAQsnTtj7dwZTKZmc+jps2Vvs2bXgprpPb59bLGe+rGu3d21r36L0ez8dujTtea1cMTTq8vA4Bcqmq1mEQYA7govb/95BRExFtKfHIzdEb4nSfWiulxU5ubi+iGXyi1bcOXk4N2374zrmBMSMETYsXTqjK1vX6zdu2Ht2hVTmzYoRiNKCD76NFj2FGyj0n38Ua/fZc9hdeG3nPifZ5PNc/KKVS5wO2hraV9rXqQ5iodumCHDh4tz1mzCAODQT8eYO209BoPCXX8bgdka/se5Q4Hm9eI7fBjPvn14fsrDtWkT3vx8NLcbV07OGde1dOyIv7SUiMGDMXdIxBQfj71vXwyOKKzdu4X9nsWh4gLmL3ut5vATwFHXYTLV0//covwqbX21f7cVFC6Pv4qxF//hpOUNikFGjxXNKwwAspfsZdXcnST0iGX0AykYDOF1Q1pTpHm9eA8dxpP3E56dO/Hs3Yd7507QNCq+/x7Fbj/jSW1DZCTWnj0xOhzY+vTBGBuLrWcPDJGR2Hr1CozzZDaH1c2HHo8bn1r7MFNxyX4mL7wVt3by4acz7WEADHDbuLzD9apAfswAABsgSURBVKd8z2AwcOWQ3xAbI+fbwlmzCwOAVfN2kv3NXvpenMiI9Kb/HIHmwFdUhFpRQeXWbaiuCsoWfYWpfTuOZn6MtUsX3D/+eNY2IoYOwbf/AJHDh2OMbYF67BhRaWmBS2wNBuwpKSiKgqZpYRUcAHO/fYPNBae+IXGulnvW9eN9KsNNp775zqAYuHHYw3IOo4lrlmEAsHLuj+Qs2cd5w9pz8S09wu4/f3Ojut1olZX4jx2jcvNm1AoX5WvXYIqL59iiRRjsdjS/D2/e3jM3ZDCAqhJx/vkYW7XCX1SE47JfYYqLR60oxzFsGKa4OPzl5eD3Y4qLa5wONqCde3PZvGftad9/ZeffOGI6++G44e7YM74/otMY0i//8znXJxpH2IXBr68YzRdfLTjrcpqqseTfW9ix7hBdBsRz+R29MZ7iWnERXvylpWiqitHhwLl8OZo3cCy+9JNPMbWOx+90Urboq8C9EyYTvsLCM7ZnjIvD1qMHqtOJ6nLR6u67UcvLUV0uokZdhrldO1SXC7W8AlNcqyZ5P4bP58VZddf2qTz4wa/ZY3Jypq9TRVVhklJ59hPbHS0dybh93rmWKX6hsAuDc+mQpmms/Ww3G77Ko2X7SK75QwqOWLkKQxzn3rkT1eXCEOmgfMVyvIcPBwb9UxSK/vFm4Gqo6CgqczbVqb3o0deglleglpejeb20/tMDgbCoCMyzJidj79u3ZnlNVQGa/Iny5/9zO2sr1p91ud1VF/n1cZvgjPESkGhsx1/v/OIXViegmYdBtf99sYfvF+4BBcY+lEq7bi0aqDoRrjSPB/fOnSg2O4bICMqWLMGbX4AhMhJDRERghFmvF2NMDIaICNw7dpy2LXNSUiAcKirQKgJDdLd+5JHAPFdgvmIy0/pPDwBUzXehulwoFkvgHo4m6tPvZvCfH99E4+wfIUeMPo4YFfq57XVuP9YQw9/u+FqGCTmFJhMGa9as4fPPP+e5554743L17VDh3jIWvJqNx+3j/Ks6M/DKjnIeQTQY1eUKXCllsWCIiMAQEUFFVhbOZcsxOByBeXY7R+fPRz12rGY9xWpFc7vP2LbjkkswxbVCrQgEhFbpQq1wEXvLLUSkDqgKjsrAfJcLzeMhYvBgjNHRDd3toPr0uxn8a+cbqHUIDgCnQaXQZGCYuwVm5dyGK1dQuGHQgwwbcHV9Sm0SmkQY5OXlsWTJErZu3cpLL710xmV/SYfKS90seXcL+dtKiG0XyaCrOpJ8vlx/LfSjaRr+4uLAHofdhmI0olZWUjxrFvhVDBF2FLsdgz0C7759FP3jHyhWK8boaJSIwHyDzYYrO/us24q5djSqqxK10oXmqkStrAqM8gri7r+PiAsuQKsMzFddLjS3G9XlAp8vcIVWVFQj/ETqb/GaD5m2JQNPPb7jlRoN9K+0kNpiSL2336/TxVx2wQ31Xr+hNYkwqPbwww83aBhA4MTy5pX7+d/ne6g45iF5cBsuvrUnZovsVoqmy7liBZ68vRjsNgx2e024GOx2CiZOxH+0FKPDgWK3YbDZMdhsVSFjp2zx4rO2r1gstLjxxkCQVLrR3JWolW40l6vmSq/YW24hNn08mtdbM0+tWbYSze1Bc1cGhixJSmqEn0rdjZnZj12WX/axlujVeOvXnwSpImjTKimod5pLGJyG1+Nn+Yfb2bbmIEaTgUsm9KTHBbKXIJof5/LleAsKAgFis6LYqgLFasVgs5F32/+hlpVhiI7GYLUGQsRatVzVdPmKwLNFMBrB7z/rNtu/+EKtgFDd7tqvK91oHjfWXr2Iu/vuOvVD0zTQtHqdjD9aVsRP+7ed83rV3lr2OCusR+q9/qkMqozg3XvWBa29hgiDcxpMJicnh5deeonZs2ejqipTpkxh+/btWCwWMjIy6NixY1CLqyuzxciv/u88kge3ZfE7m1ny7hZylxUw8uZk4hJDe3dYiGByjBhxxvd7/O/7s7ZRtmQJFf9bHwgImxXFakOxWgJ7INbj84pm/BPX+iz2PzLx5EbMZgwWC4rVimKz4tt/AL5chCsnp2pvxI3qqQ6N6unjrzWPB1PbtnT7ZjGK2Yzm96NVva+6PWge9wnTVe14A9MGn5++Fw6t93mV35leJmHN61DH8xtn871rA9vMTu5+68KgtBcQ/C+7dd4zmDlzJgsWLMBut5OZmcnixYtZunQp06ZNIzs7mxkzZvDmm2+e08YbIt28Hj/rv9hDztJ8/F6VviMTGHJd12b39DQhGprm9eLesyfwoW+zBfY+rFYUi+WkQQuPffU1hX//eyAcrNbjQWG1BsLFYq2atmCwWqncspXyVatAUQJ7KL5Tjxp7OtFXXUnLO+4MBIbnhPDweKrCxFPrPbXmtRfN7SZi8GBirv51UH5Of8/8A4uOfRekaAlwfJms355BUlIS06dP55FHHgEgKyuL4cOHA5CSkkJu7tlvk28MZouRodd1I+WyJNZ+uosflhewZfUBUi7rwKArO2GS8wlCBIViNmNLrtsQMdFXpBF9RVqd2/YWFFD873+jGIyBcKkKCcVSFTZWKwarpea1Yjk+/dOtEzj25SKOfbnoHDqjVLVjQXO5OJqZCXA8MLzeqr89tebVCpETlzvhz2ivl1uv+h1x991X93rOYuyXY4PWVrU6h0FaWhr5+fk1006nE4fDUTNtNBrx+XyYQmQYY3uUhUsm9KL7+W1Y/tEOshblkbUoj76XJHLh2K6YzBIKQoQqc0ICbR97rF7rdnx3Ft4DBwJBYQ4ERE1w1IRH1fyqvzGZai5PP/DUZI5mZrL/4YdPu41A2+bjbVb/qZlnDlw9FhOD64cfKHz1NRSrLRAW3p8Fx+nmneE94oM/tEq9P7kdDgfl5cfHd1dVNWSC4ESJPVuS/uRgDuwqZd2C3fzwXT4/fn+Iwdd0ps+IBBQZCVWIsGLv3x97//71Xr/N448Re/NNJ3/Amy0YLOZzHlV3/+OPUzpvPodffPH4TJPpeNtmM4ol8LfBYgm0bzZjMFsCgWKOqXrfUrMsmzfXu3+nU+9P79TUVL777juuuuoqsrOzSa7j7qIeDEYDCcmxXPdQKjvWHWTtgt0s/2gHyz/awZAxXeh3SQd5XoIQAgCD1YqtZ8+gtdfu2Wdp/dBDtT/8f+mQJWN1PEz0c6NGjWLVqlWkp6ejaRpTp04NZl0NQlEUegxpR/IFbfl+4R62rzvI2k93s/bT3fQc0paRN/eQcwpCiKBSDAZMLVvqXcZZNbmxiYJJVTUKtpew9L2tOEsCwwZ0H9Sai27oTmSMDIInhAhNut9nEG4MBoUOvVpy29QL2bQ0n9zlBfy4/jA/rj8MQK8L2zHy5h4Y6zA+vBBCNGXNOgyqKYpC/191oN+lifz4v0Os/PhHXGVetq4+wNbVBwC4YHQXGRBPCBG2JAxOoCgKyYPbkjy4LX6fysLpORRsLwFg3YLdrFuwG4DLbj+P5MFtJBiEEGFDwuA0jCYDYx4cAEDZkUo2Lt7LlpX78ftUlry7hSXvbgGgz4gEhozpgjXi3IbZFUKIUCJhUAdRLW2MSE9mRHoyh346Rs6SvezaUIiqauQuLyB3eQEArRIiueTWXrTp3LTGmhdCCAmDc9SmUzSX39UHALfLx+YVBWxbc5CSA+UUF5Qz94Xjjwts3TGKtN/2IbpV3Z/uJIQQepAw+AWsdhOpl3ck9fKOaJpG/rYSfvhvPnmbi1F9Gofzypj9+Jqa5Tuc15LBV3embZcYHasWQoiTSRgEiaIELlPt0Ctwc0lluZel721lT05RzTL7thxh35bj46RHtbLRd2QiyYPbENlC7msQQuhHwqCB2CLNXHVfv5ppd4WXDYv3svHrPKpv8ysrrmT1/J2snr+zZrnoOBt9L06k10Xtsdrln0cI0Tjk06aRWCPMDB3TlaFjugKBJznt//Eo29cdZPfGQtwVgfHajxVVsmruTlbNPR4QcR0c9B6eQNfUeOwOiy71CyHCm4SBThRFISE5loTkWC6d0AsIBMTB3cfYtHQfO7MO1yxbtM/Jsg+2s+yD7TXzYuLtxLaLpGtqPB16tZThM4QQv4iEQQhRFIV2XWNo1zWGtKpHxWqaxoGdpez4/iCHfjpG0T4nAKWFLkoLXfy0qahWG227RBPbNpK2XWPo2LsV9igzBqMMpyGEODMJgxCnKArtu7egffcWteb7vSoHdpeyb0sxu7OLOHqoAoCDu49xcPexmmE0qtmjzLTuGE3rTtG07hhF2y4x2CLlRjkhRICEQRNlNBtI7BFLYo9Yhl7XDQjsRXjdfooLysnLLaJon5OifCflR924yrzk5RaTl1tcqx2zzUhcooPWHaNJ7BFLfMcoOeQkRDMkYRBGFEXBYjPVHGo6kaZpOEvcFO4tI39bCXs3F1Na6MJb6efAzlIO7Cwl59t9tdaxRphI6t0Ka4SJmHg7rTtG0bK9A4vdhEGeECdEWJEwaCYURSGqpY2olja6pMTXeq+y3EtxvpP9O49ycHcph/Ycw13hw13h48f/HTple/YoMyaLEUeslU794lAUhbgEBy3bR2K2GbHY5FdLiKZE/scKbJFmEnrEktAj9qT33C4fFaVuSg5UsH/XUYr2lmE0G9m7uRjwUlZcyYGdpadsNyLGgt1hIaqVjfbdW2AyG2jXLQbFoNCiTQRGObEtRMiQMBBnZLWbsNpNxLaNpMuA2nsUmqZRftSD6lcpLnBSvL8cv1flx/WHiIm3s3fzESpKPRQXOE+66gnAbDXidfvpmtoaZ0klvS5sh6fST2zbCKJb2VFVjVbtI1HkkJQQDU7CQNSboig4YgMnm6Pj7HTuHwiLC0Z3qbVcaaEL1a9ycHcpmgp7cgqx2E3szi7EGmli14bAPRWH9hw75XYsNiOeSj9tOkcT1dKGs8TNecPaoWmgKNA1tTU+j4rfp+KItcpzJoSoBwkD0eBi4gOjtsa2jQTgvGHta72vaRqlh10YzQaOFbkoOViByWJgy8r9OGJtaKrGzqzDHD1cURMYB3cfPzS19L1tNa8VBVp3isZd4SOqpZWk3q3wuHz4vCo9h7TDYFTwVPqwR1mIamlr6K4L0WRIGAjdKUrgHAIEnh2RkBw4d9FzSLuaZU68Ca+4oBwAk9nA9nUHMVkMmMxGsr76iahWdiw2I4f2HOPooQr2bS2paWPj4r21ttumczRet5/Kci8t20XSqW8cXrcPr9tPp75xtGwfiafSj9ftx2o3yWCCIqxJGIgmRVEU4hIdNdMnHpLq/6sONa89lYGroSx2E2arkR3rDuLzqpitRg7/dIwDu0qx2IxEtrCye2MhFaUe8rcdD44NX9cODoDO/ePw+zS8bh+eykBADLg8Ca/bj88TCI0WbSJIOq9VzTqqqqGpGkaTnCwXoU3CQIQli81U6/LWnkOP72X0uKBtrWU9Lh+VFV4stkBw7N9xlIN7SjFbjYHwyCtjd3YhJQcrsNiMmG0mivMDh6n2/3j0pG07WlrxuVW8bj9+nwpAv0sTiYi24K304/X48bn9GC1GLhrbDQ0tsLwnECo+r0rL9pFytZVoVEENgw0bNjBnzhwAHn/8caKj5fGPIvRZ7CYsJwwX3uG8lnQ4r2XNdO/hcMmtPWut467wUri3DFNVYJgtRorynexYdxCz1RiYbzFyrMjFro2FbFqaDwTOaZisRryVfgB++C7/lDWZLAb6jEjA61EDAeHx43UHTpIPuDyJtl1iaub7PNVBoqL6VNp1a4HZagz2j0mEuaCGQWZmJs888wybNm3iyy+/JD09PZjNCxEyrBFmEnu2rDUvOs5+0g19ELipD8BsMWIwKSiKgsflY+M3gUNRJosBk+V4qCx+ZzM+j0ruiv2Yq94zWYyYLQYO55VRsL3kpG2cyGQx0HtYAj5vICCq9za8bj+qX2PodV1p1y0Gv1fF51UDy1Qv6/Xj96q0SnDI2FXNTFDDwO/3Y7VaiY+PZ+3atcFsWogm61Qfqha76aRLcKt1G9gaFE55ieyuDYdrrraqDojqsDBZDCx4NRufR2XL6v2YzCe8ZzZgMhs4uLuUT17ecNaajWYDw67vVjssvCr+EwLD51Vp3SmaQVd2Om07qj+wnN+rYrYZMZlljyVUBTUM7HY7Ho+HwsJC4uLigtm0EM3GmW6y65ra+ozr3vv6xYE2TnOvRe6yfEqLKquCInAVVuBvA8aq14vf3ozX7WfZhztOKIqqQAksYzQbKD3sYk9OEXtyivBXh4VXrfW3pmo1TUTGWLj+0UE194RUL+f3qvh81a+r2vGpRMZYz9pfETx1DoOcnBxeeuklZs+ejaqqTJkyhe3bt2OxWMjIyKBjx47ceOONPPXUU3i9Xp555pmGrFsIcQpnu+Guz8jEs7Zxx1+HUXHMg7F6z8JswGBUTmo7L7eYjYvzMJoMGM0WTBZjYB2zoebv6pBZ/8Ueyks9/HvS6nPqz4j0ZBQF/D6tZo/E79OOB4jveKBUB4ymagy+uvMph1c5G03TUP3aKfsb7uoUBjNnzmTBggXY7YGbh5YsWYLH42HOnDlkZ2czbdo03nzzTfr06cO0adMatGAhRMMyWYxEx9nPulzHPq3o2KfVWZcD6D6oNXtyiqqCw4DRVDs0jCe+NhnZk1PI8o92sPyjHSe1ZTApx9c3GWrarJ4+sKuUT/+2kZ5D2gbCwqfV7G2o1XskvhP+VAdM1TQadOoXx6/v73eKntSm+muvG9iGVrttv0bLdpFERIf2I2vrFAZJSUlMnz6dRx55BICsrCyGDx8OQEpKCrm5uQ1XoRCiyXPE2uh78dn3Sqr1GZlAUu9WKAq1PuiNJsNZx6r69G8bKcovI39HSc061esbTAYi7Kaq+UpNMJ0YUruzC/lpUxHzXlx//IPeq+Kv+uBXfcf3RDTtjKXUSOwZy7V/GnDS/Oo9kZoQ8Qfarp7n91W9/tn2G0KdwiAtLY38/OOXwDmdThyO4zf+GI1GfD4fJpPctiCE+OUURakZxuRcjXnw5A/dc9GyXSQ/LMvHaDJgjawKCmNgb8RwYnhU7aGcOG0w/TxcFNZ9tpv8bSW899hq/P4T9hyqXoeKen16OxwOysvLa6ZVVZUgEEKEhe7nt6H7+W2C1p7PExjJt1aomKr3VKoCxWjAaFYwGGsHjeGkdQKvv/1D0MqrUa9P8NTUVL777juuuuoqsrOzSU5ODnZdQggRFrqkxJ/y/pNQU68wGDVqFKtWrSI9PR1N05g6dWqw6xJCCNGI6hwGiYmJZGZmAmAwGOTSUSGECCMyEpYQQggJAyGEEBIGQgghkDAQQgiBhIEQQggkDIQQQqDzYy8LCgoYO3asniUIIUSTU1BQEPQ2FU2r61BLQgghwpUcJhJCCCFhIIQQQsJACCEEEgZCCCGQMBBCCIGEgRBCCHS6z0BVVaZMmcL27duxWCxkZGTQsWNHPUo5Jzk5Obz00kvMnj2bvLw8Hn30URRFoXv37kyePBmDwUBmZiYfffQRJpOJ++67j0suuYTKykr+8pe/UFxcTGRkJC+88AItW7YkOzub5557DqPRyLBhw/j973+vS7+8Xi+PPfYYBQUFeDwe7rvvPrp16xY2/fP7/TzxxBPs2bMHo9HI888/j6ZpYdM/gOLiYsaOHcusWbMwmUxh1TeAMWPGEBUVBQSG07/33nvDpo8zZsxg6dKleL1ebrrpJgYPHqxP3zQdfP3119rEiRM1TdO0jRs3avfee68eZZyTt956S7v66qu1G264QdM0Tbvnnnu0tWvXapqmaU8++aS2ePFi7fDhw9rVV1+tud1u7dixYzWvZ82apb322muapmna559/rj377LOapmna6NGjtby8PE1VVe2uu+7ScnNzdenb3LlztYyMDE3TNO3IkSPayJEjw6p/33zzjfboo49qmqZpa9eu1e69996w6p/H49Huv/9+7fLLL9d27twZVn3TNE2rrKzUrr322lrzwqWPa9eu1e655x7N7/drTqdTe+2113Trmy6HibKyshg+fDgAKSkp5Obm6lHGOUlKSmL69Ok105s3b2bw4MEAjBgxgtWrV7Np0yYGDBiAxWIhKiqKpKQktm3bVqu/I0aMYM2aNTidTjweD0lJSSiKwrBhw1izZo0ufbviiit44IEHaqaNRmNY9e+yyy7j2WefBWD//v3ExcWFVf9eeOEF0tPTad26NRBev5sA27Ztw+Vycccdd3DbbbeRnZ0dNn1cuXIlycnJ/O53v+Pee+/l4osv1q1vuoSB0+nE4XDUTBuNRnw+nx6l1FlaWhom0/GjapqmoSgKAJGRkZSVleF0Omt2ZavnO53OWvNPXPbEn0H1fD1ERkbicDhwOp388Y9/5E9/+lNY9Q/AZDIxceJEnn32WdLS0sKmf/Pnz6dly5Y1HwgQXr+bADabjTvvvJN33nmHp59+mocffjhs+lhSUkJubi6vvvqq7n3TJQwcDgfl5eU106qq1vqgbQoMhuM/uvLycqKjo0/qV3l5OVFRUbXmn2nZ6OjoxuvAzxw4cIDbbruNa6+9lmuuuSbs+geBb9Bff/01Tz75JG63u2Z+U+7fvHnzWL16NRMmTGDr1q1MnDiRI0eOnFRXU+xbtc6dOzN69GgURaFz5860aNGC4uLik+prin1s0aIFw4YNw2Kx0KVLF6xWa60P7sbsmy5hkJqayvLlywHIzs4mOTlZjzJ+kfPOO49169YBsHz5cgYNGkS/fv3IysrC7XZTVlbGrl27SE5OJjU1lWXLltUsO3DgQBwOB2azmb1796JpGitXrmTQoEG69KWoqIg77riDv/zlL1x//fVh179PP/2UGTNmAGC321EUhT59+oRF//7zn//w/vvvM3v2bHr16sULL7zAiBEjwqJv1ebOncu0adMAOHToEE6nk4suuigs+jhw4EBWrFiBpmkcOnQIl8vF0KFDdembLgPVVV9NtGPHDjRNY+rUqXTt2rWxyzhn+fn5/PnPfyYzM5M9e/bw5JNP4vV66dKlCxkZGRiNRjIzM5kzZw6apnHPPfeQlpaGy+Vi4sSJFBYWYjabefnll4mPjyc7O5upU6fi9/sZNmwYDz74oC79ysjIYNGiRXTp0qVm3uOPP05GRkZY9K+iooJJkyZRVFSEz+fj7rvvpmvXrmHz71dtwoQJTJkyBYPBEFZ983g8TJo0if3796MoCg8//DCxsbFh08cXX3yRdevWoWkaDz74IImJibr0TUYtFUIIITedCSGEkDAQQgiBhIEQQggkDIQQQiBhIIQQAgkDIYQQSBgIIYQA/h/Q2Cutc0YWQgAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "df = s['count'].to_frame('freq')\n",
    "factors = [1, 1e-2, 1e-4, 1e-6, 1e-8]\n",
    "for f in factors:\n",
    "    sf = make_sampling_table(vocab_size, sampling_factor=f)\n",
    "    df[f] = df.freq.mul(sf)\n",
    "df.loc[:, factors].plot(logy=True, xlim=(0, 60000));"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:16:14.105553Z",
     "start_time": "2020-06-21T17:16:14.102344Z"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "sampling_table = make_sampling_table(vocab_size, sampling_factor=SAMPLING_FACTOR/10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:16:14.314660Z",
     "start_time": "2020-06-21T17:16:14.106505Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAAEYCAYAAAAJeGK1AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAgAElEQVR4nO3deVxU9f4/8NcsbDIo4loqXkHRvJqIZpmgYpGWC64XXLCbuWV1v6WRel0yJKTd8t4seeT3dm0RfrilaZaakbhCoFFXUSQ2F1BAmBmGGTif3x/afOW6jChwZs68nn9xljnn/Rl4nBefz9lUQggBIiIiO6OWuwAiIqKbYUAREZFdYkAREZFdYkAREZFdYkAREZFdYkAREZFdYkDRLWVmZiIqKgqjR4/GqFGjMHPmTJw+fRoAcOTIEYwaNeqGz/zyyy/429/+dlf7O3bsGGbOnInhw4djxIgRGDt2LLZt23ZPbaivhIQEhIeHY8yYMRg1ahTefPNNmM3mRt/v5s2bMWfOHADAkiVLcPDgwQbbbr9+/RAeHo6xY8ciPDwckZGRyMjIqPe2unfvjtLS0np9JioqCt9+++0N8y9evIjIyEgAwJo1axATEwMAmDVrFs6cOQMAmDFjhnV/188nJyKIbqK6uloMGDBAZGVlWedt3bpVDBkyRNTU1IjDhw+LkSNHNtj+9u/fLwYPHiyOHTtmnVdQUCCeeOIJsXv37gbbz+3s3LlT/OUvfxFVVVVCCCFMJpOYPXu2ePfddxt935s2bRKzZ89uku3u3btXDBo0SFgslnptKyAgQFy+fLlen5k2bZrYtWvXbdf58MMPxeuvv94g+yNl0codkGSfqqqqUFlZCaPRaJ03ZswY6HQ61NbW1lk3LS0Nr7zyCt577z1YLBasXLkSO3bswKJFi+Dm5oaTJ0/i8uXLGDRoEJYuXQoXF5cb9vfOO+9g8eLF6N+/v3Vex44d8cYbb1hrWLRoEcrLy1FQUIChQ4di4sSJiImJgcFgQElJCXr06IHVq1fDzc0NvXv3xjPPPIODBw/CaDTihRdewLfffovs7Gy0bdsWH3/8MZo1a1anhpKSEtTW1sJkMsHd3R1ubm5YtmyZ9b/43Nzce95fz549MWvWLPz0008wGo2YP38+nnjiiTp1REVFYerUqejVqxf++te/YsiQITh+/DgqKioQHR2NsLAwVFVV4bXXXsPx48fh5eWFrl27AgDi4+Nt/m4HDhyIkpISVFRU4K233qrznc6dOxevv/46Tp48CZVKhZCQEMyfPx9a7dVDxerVq/HLL79AkiS89NJLCA0NhdFoxIoVK5CXl4fy8nJ4enrinXfegZ+fHwDg+++/x7p162AymTB69Gg899xzKCwsxOjRo2/oyQ0bNgwffPABvvzySwDA008/jXXr1mHq1Kn44IMP0Lt3b+zbtw9r166FxWKBu7s7Fi5ciL59+yInJwdLliyB2WyGEAITJ07E1KlTbX4fZL84xEc31aJFC0RHR2PmzJl47LHHEB0djU2bNuHRRx+Fq6urdb3Dhw9j8eLF+PjjjxEUFHTDdk6cOIH169dj586dyMnJQWJi4g3rVFRUIDs7G8HBwTcs69+/PwYPHmydNplM+OabbxAdHY2kpCSMHTsWSUlJ+O6771BYWIj9+/cDAMxmM1q3bo3k5GSMHTsWS5cuxZIlS7Bz507o9Xrs3bv3hn2NGzcOzZs3R3BwMCIiIhAfH4/z58/jwQcfBIAG2V9tbS08PDywefNmrF69Gn//+99vO2xWUFCA4OBgJCcnY8GCBYiLiwMAfPTRR6itrcWuXbvwr3/9C7/99tstt3E9IQQSExMREBAAHx+fG77T2NhYeHt7Y/v27di0aRNOnTqF9evXWz/fsWNHbNmyBW+//TYWLVqE0tJSpKSkoHnz5khMTMTu3bvRq1cvfPHFF9bPGAwGJCUlISkpCV9//TV+/PFHm3WuWrUKAPDZZ5/hvvvus87//fff8f7772PdunXYunUrVq5ciRdffBFGoxGffvophg0bhs2bN2PdunVIS0uDJEl39L2QfWIPim7pmWeewaRJk3Ds2DEcO3YMCQkJSEhIQHJyMgDgwoULmDt3LiZPnowePXrcdBvjxo2Dp6cnACA8PBx79+7FtGnT6qwjrj1tS6VSWee99NJLyM3NhcViQatWrbBhwwYAQL9+/azrREdHIzU1FQkJCfj9999RXFxcp8c3fPhwAICvry8CAgLQrl07AFcPsleuXLmhVi8vL6xfvx4FBQU4fPgwjh49itmzZ2PKlCmIjo5usP390f4ePXogICAAx44du8VvAHBxccGQIUMAAD179kR5eTkA4Mcff8TixYuhVquh0+kwbtw4nDp16qbbSEtLQ3h4OFQqFcxmM/z8/PDhhx9al1//naakpOCrr76CSqWCq6srIiMj8dlnn2H27NkAgMmTJwMAAgIC4O/vj4yMDIwYMQKdOnXChg0bkJeXh6NHj6Jv377WbU6cOBFarRY6nQ7Dhw/HwYMH4e/vf8s2305qaiqKi4vx17/+1TpPpVIhPz8fYWFhWLhwIU6cOIGBAwdi6dKlUKv5P7gjY0DRTaWnpyMjIwMzZ85EaGgoQkNDMX/+fIwaNQqpqalo2bIlNBoN1q1bh3nz5mHEiBHo06fPDdvRaDTWn4UQUKvV2Lt3r/UA2bZtWyQkJMDf3x9Hjx5FaGgogKtDScDVizFWrlxp3cb1w3Lz589HbW0tnnzySQwdOhTnz5+3hh2AOkOJNxtW/G8JCQno168fgoKC0KlTJ0yaNAlpaWmYNWsWoqOjG2x/138nkiTVmf5vLi4u1oPs9QGu1Wrr7Pt2B+L+/fvjk08+ueXy679TSZLq7EeSJNTU1Nx0P5IkQavV4ssvv0RSUhKmTp2K0aNHw9vbG4WFhTdtrxDCOlx4NyRJwsCBA61/HwBw/vx5tG3bFj169MDu3btx8OBBHDp0CP/85z+xefNmtG/f/q73R/Livxd0Uz4+Pli7di3S0tKs80pKSqDX6xEQEAAAaNOmDYKCgrBw4UK8+uqrqKqqumE7u3btgtlsRnV1NbZs2YLQ0FA89thj2LZtG7Zt24aEhAQAV88vxcbG4ueff7Z+Vq/XY//+/bc8+B44cADPP/88nnrqKQDA8ePHbzg/Vh8mkwnvvvuutZcCANnZ2ejZs2eD7m/r1q0AgF9//RW5ubl46KGH6r2NIUOGYNOmTZAkCVVVVdixY0edYLlbwcHB+PzzzyGEgNlsRlJSEh599FHr8i1btlhrz8/PR58+fXDgwAGMGzcOkyZNQpcuXbBv374638vWrVshhMCVK1ewa9cuhISE3FEtGo2mTjgCV8+fpaamIicnB8DVnuSYMWNgMpmwYMEC7Ny5EyNHjsRrr70GnU6H/Pz8e/1KSEbsQdFNdenSBf/85z/x/vvv48KFC3Bzc4OXlxfi4uLg5+eHkpIS67rjxo3D7t27ER8fbz14/8Hd3R1TpkxBRUUFhg8fjgkTJtx0f4MHD8Z7772HtWvXoqioCBaLBUIIDB48+Jb//b/88st4/vnn0axZM+h0Ojz00EP3dECaN28eVCoVIiMjoVKpIEkSevXqZf1vvaH29/PPPyMpKQmSJOH9999HixYt6r2NOXPmICYmBqNHj4aXlxdatWoFd3f3em/nvy1duhSxsbEYPXo0LBYLQkJCMHfuXOvygoICjB07FiqVCu+99x68vb0xY8YMLF++3Dr0GxgYiOzsbOtnvLy8MH78eJhMJkybNg2PPPJInR7WrYwYMQJRUVFYs2aNdV7Xrl0RExOD+fPnW3tja9euhaenJ+bNm4clS5YgMTERGo0Gjz/++F2FP9kPlRB83QY1jkWLFqFbt2549tln5S7FbnTv3h2HDh2yXqBwt7755hvodDoMGTIEkiThxRdfxKBBgzBlypQGqpRIfhziI3JA3bp1w9q1axEeHo5Ro0ahbdu2mDRpktxlETUo9qCIiMgusQdFRER2iQFFRER2qcmv4nv44YfRoUOHpt4tERHZqaKiIhw5cuSG+U0eUB06dMDmzZuberdERGSnxo8ff9P5HOIjIiK7xIAiIiK7xIAiIiK7xIAiIiK7xIAiIiK7xIAiIiK7dEcBdfz4cURFRd0wf9++fZgwYQIiIiKQlJTU4MUREZHzsnkfVEJCAr7++mt4eHjUmW+xWLBq1SokJyfDw8MDkydPRmhoKNq0adNoxRIRkfOwGVC+vr5Ys2YNXn311Trzc3Jy4Ovra32XTb9+/ZCWloYnn3yycSp1UjW1Etbuz0GZ0SJ3KUREdYwP6oBeHer/PrM7ZTOghg8fftOXi+n1enh5eVmnPT09odfrG7Y6wqafC/Hu99nQuWlx7+9LJSJqOP5tPeUNqFvR6XQwGAzWaYPBUCew6N6ZLLVYvec0Ajt5Y8u8Rxvkld5ERI7irq/i8/f3R15eHsrLy2E2m5GWloa+ffs2ZG1O7/PDeTh/xYRXh3dnOBGR06l3D2r79u0wGo2IiIjAokWL8Oyzz0IIgQkTJqBdu3aNUaNTMlTX4KP9OQju2hqPdm0tdzlERE3ujgKqY8eO1svIR48ebZ0/bNgwDBs2rHEqc3JfHMlDqcGMBU8EyF0KEZEseKOuHTJZarEuJRch3Vqjr29LucshIpIFA8oOJR4rwCV9NV4I7Sp3KUREsmFA2RlzjYRPfszBQ39qiYf9WsldDhGRbBhQdmZLRiHOXTHhhWHd5C6FiEhWDCg7UlMr4aP9OXiwYwsM7sYr94jIuTGg7MiOE+eRd9mIF0K78r4nInJ6DCg7IUkC//jhDHq098LjD/B+MiIiBpSd2P3rBZwp1uP50K5Qq9l7IiJiQNkBIQTW7DsDv9aeeKr3fXKXQ0RkFxhQduCHU8X47XwFnhvqDw17T0REABhQdmHt/hx08PbA2L4d5C6FiMhuMKBklp5XimO/l2FmSBe4aPjrICL6A4+IMvv4x7PwbuaCiIc6yV0KEZFdYUDJ6EyxHt//dhHTB/4JzVzv+t2RRESKxICSUULKWbhp1Xh6YGe5SyEisjsMKJlcrDBhS0YR/tK/E1rp3OQuh4jI7jCgZLI+NRc1koRZIX5yl0JEZJcYUDKoMFnw5eF8PNX7Pvi2aiZ3OUREdokBJYOvjuSjsroGc4f4y10KEZHdYkA1seqaWnx6IBfBXVujV4cWcpdDRGS3GFBNbFvGORRXVmPOEJ57IiK6HQZUExJCIOGns3jgvuYI7soXEhIR3Q4Dqgn9dPoSThfrMTO4C19ISERkAwOqCX16IBdtvNwwqg9fqUFEZAsDqomcKa7Ej9kliHqkM9y0GrnLISKyewyoJrI+9Xe4atWY+rCv3KUQETkEBlQTKDOYsfnnQozv24GPNSIiukMMqCbw5dF8mCwSnhnURe5SiIgcBgOqkZlrJPz70O8I6dYa3dt7yV0OEZHDYEA1sl1Z53Gxohoz2HsiIqoXBlQjEkLg0wO58GvjiSEBbeQuh4jIoTCgGlFaXhlOFF7BjEFdoFbzxlwiovpgQDWif6X+jhYeLhgf1EHuUoiIHA4DqpFcrDBh968X8Jf+HdHMVSt3OUREDocB1Ui+PJKPWiEw7ZHOcpdCROSQGFCNwFIr4auj+RgS0AadW3nKXQ4RkUOyGVCSJGH58uWIiIhAVFQU8vLy6iz/+uuvMW7cOEyYMAFffvlloxXqSHb/egHFldWYPpC9JyKiu2Xz5MiePXtgNpuRmJiIzMxMxMfHY+3atdblb731Fnbs2IFmzZph5MiRGDlyJFq0cO43xf77UB46+XhgSEBbuUshInJYNntQ6enpCAkJAQAEBgYiKyurzvLu3bujsrISZrMZQginf8/RyQsVOJpbimkPd4aGl5YTEd01mz0ovV4PnU5nndZoNKipqYFWe/Wj3bp1w4QJE+Dh4YGwsDA0b9688ap1ABsO5cFNq8Zf+neSuxQiIodmswel0+lgMBis05IkWcPp5MmT2L9/P/bu3Yt9+/ahtLQUu3btarxq7VyFyYItGUUY0+d+tPR0lbscIiKHZjOggoKCkJKSAgDIzMxEQECAdZmXlxfc3d3h5uYGjUYDHx8fVFRUNF61dm5zeiGM5lpMH/gnuUshInJ4Nof4wsLCkJqaisjISAghEBcXh+3bt8NoNCIiIgIRERGYMmUKXFxc4Ovri3HjxjVF3XZHCIENh/MQ2MkbvTs690UiREQNwWZAqdVqxMTE1Jnn7+9v/Xny5MmYPHlyw1fmYA6fLUVOiQHvTuojdylERIrAG3UbyFdH89HcXYuRD94ndylERIrAgGoAZQYzvs26gPFBHeHuopG7HCIiRWBANYBNPxfCXCshcgAvLSciaigMqHskhMDGYwXo6+uNHu2d+x4wIqKGxIC6R2l5ZThTrMfkh3zlLoWISFEYUPfoq6P50LlpMaoPL44gImpIDKh7cMVowTcnziM88H6+lJCIqIExoO7B1swiVNdImDyAw3tERA2NAXWXhBD46mg+endogV4d+OQIIqKGxoC6S5kF5Th5oZK9JyKiRsKAukv/L70QHi4ajObFEUREjYIBdRdMllpsP34OT/ZuDy93F7nLISJSJAbUXfjut4uoNNVgYlBHuUshIlIsBtRdSE4vRAdvDzzi10ruUoiIFIsBVU8Xrphw4HQJJgR1gFqtkrscIiLFYkDV0+aMQkgCmNCPw3tERI2JAVUPQggkpxdiwJ980LmVp9zlEBEpGgOqHjIKynG2xICJ7D0RETU6BlQ9JKcXwt1FjSd7t5e7FCIixWNA3SHrvU+97uO9T0RETYABdYe+/+PeJw7vERE1CQbUHdqWWYT2zd0xkPc+ERE1CQbUHSgzmLH/VAnGBN7Pe5+IiJoIA+oO7Mw6jxpJYEyf++UuhYjIaTCg7sC2zHPo2laHP9/fXO5SiIicBgPKhnPlVTiaW4rwPvdDpeLwHhFRU2FA2fD18XMAgDGBHN4jImpKDCgbtmWeQ2Anbz7aiIioiTGgbiP7YiX+c74C4ew9ERE1OQbUbXydeQ5qFTDqQQYUEVFTY0DdghAC244XYVDX1mjj5SZ3OURETocBdQsZBeUoKK3ivU9ERDJhQN3CzhPn4aJR4Yk/88nlRERyYEDdhBACu7IuIKRbG7Tw4JPLiYjkwIC6ieOFV1BUXoUne7H3REQkF62tFSRJwooVK3Dq1Cm4uroiNjYWnTt3ti4/ceIE4uPjIYRAmzZt8Pbbb8PNzbEvKtj1y7XhvZ4MKCIiudjsQe3ZswdmsxmJiYlYsGAB4uPjrcuEEFi2bBlWrVqFr776CiEhISgqKmrUghubEALf/HIeg7q2RotmHN4jIpKLzYBKT09HSEgIACAwMBBZWVnWZbm5ufD29sZnn32GadOmoby8HH5+fo1XbRP4pegKCsuq8FSv++QuhYjIqdkMKL1eD51OZ53WaDSoqakBAJSVlSEjIwNTpkzB//7v/+Lw4cM4dOhQ41XbBHb+cgFatQpP/Lmd3KUQETk1mwGl0+lgMBis05IkQau9eurK29sbnTt3RteuXeHi4oKQkJA6PSxHI4TAzl/OY6B/K3g3c5W7HCIip2YzoIKCgpCSkgIAyMzMREBAgHVZp06dYDAYkJeXBwBIS0tDt27dGqnUxvfruQrklxoxsjeH94iI5GbzKr6wsDCkpqYiMjISQgjExcVh+/btMBqNiIiIwBtvvIEFCxZACIG+ffti6NChTVB249j5y3lo1Lw5l4jIHtgMKLVajZiYmDrz/P39rT8PHDgQycnJDV+ZDHb/egGP+PnAx5PDe0REcuONutecLdEjp8SAsAd4cQQRkT1gQF3z/W8XAQCP92RAERHZAwbUNd//dhE972uOji2byV0KERGBAQUAuKSvRnp+GcLYeyIishsMKAD7/lMMIcCAIiKyIwwoAN/9dhEdvD3w5/uby10KERFd4/QBVWWuxYEzJXj8gbZQqVRyl0NERNc4fUD9dLoEJouEML5ag4jIrjh9QH3/20V4uWvxsJ+P3KUQEdF1nDqgJElg38lihHZvCxeNU38VRER2x6mPyieKruCywYzHHmgrdylERPRfnDqg9p8qhkoFhHRrI3cpRET0X5w8oErQp6M3Hw5LRGSHnDagLuurcbywHEO7s/dERGSPnDagfjp9CUIAod15/omIyB45bUDtP1WMVp6u6N2hhdylEBHRTThlQNVKAimnL2FwQBuo1Xx6BBGRPXLKgDpRWI5Sg5nnn4iI7JhTBtT+UyVQq4DBvLyciMhuOWdAZZegTydvtOTl5UREdsvpAuqK0YITheXsPRER2TmnC6hDZ69eXh7crbXcpRAR0W04XUClnrkMT1cNAjt5y10KERHdhhMG1CU87NeKTy8nIrJzTnWUPldehbOXDHjUv5XcpRARkQ1OFVCpZy4B4PknIiJH4HQB1Vrniu7tvOQuhYiIbHCagBJCIDXnMh71bw2Vio83IiKyd04TUKeL9SiprEZwVw7vERE5AqcJqAOnr55/GsTzT0REDsFpAurQ2cvo3KoZOnh7yF0KERHdAacIKEkSOPZ7KR7u4iN3KUREdIecIqBOF+tRbrRgQBfe/0RE5CicIqCO5l4GAAz4E3tQRESOwikC6khuKdo3d0cnH55/IiJyFDYDSpIkLF++HBEREYiKikJeXt5N11u2bBneeeedBi/wXgkhcDS3FAO6+PD+JyIiB2IzoPbs2QOz2YzExEQsWLAA8fHxN6yzceNGZGdnN0qB9yq/1IjiymoM4AUSREQOxWZApaenIyQkBAAQGBiIrKysOsszMjJw/PhxRERENE6F9+hIbikA8Ao+IiIHYzOg9Ho9dDqddVqj0aCmpgYAUFxcjH/84x9Yvnx541V4j47mlsLH0xVd2+psr0xERHZDa2sFnU4Hg8FgnZYkCVrt1Y99++23KCsrw+zZs1FSUgKTyQQ/Pz+MHz++8Squp6O5pejfuSXPPxERORibARUUFIQffvgBTz31FDIzMxEQEGBdNn36dEyfPh0AsHnzZpw9e9auwulihQn5pUZMH9hZ7lKIiKiebAZUWFgYUlNTERkZCSEE4uLisH37dhiNRrs97/SHjPwyAEBQ55YyV0JERPVlM6DUajViYmLqzPP3979hPXvqOf0hI78crho1/nx/c7lLISKielL0jboZ+eXoeX9zuGk1cpdCRET1pNiAstRKOFFUjr6+3nKXQkREd0GxAXXqQiVMFgl9fXn+iYjIESk2oKwXSLAHRUTkkBQcUOVo4+XGFxQSETko5QZUQTn6dvLmDbpERA5KkQFVZjAj95KB55+IiByYIgMqs7AcABDYieefiIgclSIDKqvwCgCgVwfeoEtE5KiUGVDnrqBLa094ubvIXQoREd0lZQZUUQV6dWghdxlERHQPFBdQpQYzisqr0IvP3yMicmiKC6isoqvnn3qzB0VE5NCUF1DnrgbUn+9nQBEROTLlBVTRFXTy8UCLZrxAgojIkSkwoCo4vEdEpACKCqgrRgvyS40c3iMiUgBFBdRv5ysAgG/QJSJSAEUFVPbFSgDAA/cxoIiIHJ2iAurkhUq08HBBWy83uUshIqJ7pKiAyr5Yie7tvfiKDSIiBVBMQAkhkH2hEt3becldChERNQDFBNS5KyZUVtege3sGFBGREigmoLIvXL1AggFFRKQMigmok9cCKoBDfEREiqCYgMq+WIn7WrijhQcfcUREpASKCaiTFyo5vEdEpCCKCChJEjhboke3tjq5SyEiogaiiIAqKq9CdY0EvzYMKCIipVBEQOVeMgAA/Fp7ylwJERE1FEUE1NkSPQCgSxsGFBGRUigjoC4Z4OWmRRsdn8FHRKQUigio3EsGdGnjyWfwEREpiCIC6myJgeefiIgUxuEDymSpRVF5Fa/gIyJSGK2tFSRJwooVK3Dq1Cm4uroiNjYWnTt3ti7fsWMHPvvsM2g0GgQEBGDFihVQq5su9/64gq8Le1BERIpiM0n27NkDs9mMxMRELFiwAPHx8dZlJpMJq1evxr///W9s3LgRer0eP/zwQ6MW/N8YUEREymQzoNLT0xESEgIACAwMRFZWlnWZq6srNm7cCA8PDwBATU0N3Nya9kq6/FIjAKBzq2ZNul8iImpcNgNKr9dDp/u/8zsajQY1NTVXP6xWo3Xr1gCADRs2wGg0YtCgQY1U6s0Vlhnh3cwFXu58SCwRkZLYPAel0+lgMBis05IkQavV1pl+++23kZubizVr1jT5pd4FpVXo1JK9JyIipbHZgwoKCkJKSgoAIDMzEwEBAXWWL1++HNXV1fjoo4+sQ31NqaDMiI4tm36/RETUuGz2oMLCwpCamorIyEgIIRAXF4ft27fDaDSiV69eSE5ORv/+/fH0008DAKZPn46wsLBGLxwAhBAoKqvC4w+0a5L9ERFR07EZUGq1GjExMXXm+fv7W38+efJkw1d1h0oqq1FdI7EHRUSkQA59o25B2dUr+HgOiohIeRw6oArLqgCAPSgiIgVy6IAquHYPVEf2oIiIFMehA6qo3AQfT1d4uGrkLoWIiBqYQwdUcYUJ7Zq7y10GERE1AocOqAsVJrRvzpcUEhEpkUMH1MWKavagiIgUymEDylIr4bKBAUVEpFQOG1AlldUQAgwoIiKFctiAulBhAgC0b8FzUERESuSwAVV8LaDaerEHRUSkRA4bUBeu/NGDYkARESmRwwZUib4aGrUKPs1c5S6FiIgagcMGVKnBgpbNXKBWN+0LEomIqGk4bECVGcxoyd4TEZFiOWxAlRrM8PFkQBERKZXjBpSRAUVEpGQOG1BlBjNaMqCIiBTLIQNKkgTKjGZewUdEpGAOGVBXqiyQBNiDIiJSMIcMqFKjGQDQigFFRKRYDhlQV6osAIAWHi4yV0JERI3FIQNKb6oBAOjctTJXQkREjcUxA6r6WkC5MaCIiJSKAUVERHbJMQPq2hCfF4f4iIgUyzED6loPypM9KCIixXLIgDJU18BNq4aLxiHLJyKiO+CQR/jK6hoO7xERKZxDBpShugbNXBlQRERK5pABVW2R4O7ikKUTEdEdcsijvKVW4vknIiKFc8ijvLlWgqvWIUsnIqI75JBH+eoaCYuEsFQAAAdwSURBVK7sQRERKZpDHuXNNexBEREpnc2jvCRJWL58OSIiIhAVFYW8vLw6y/ft24cJEyYgIiICSUlJjVbo9Sy17EERESmdzaP8nj17YDabkZiYiAULFiA+Pt66zGKxYNWqVVi/fj02bNiAxMRElJSUNGrBAHD+iok9KCIihbN5lE9PT0dISAgAIDAwEFlZWdZlOTk58PX1RYsWLeDq6op+/fohLS2t8aq95plH/4THHmjX6PshIiL52LzbVa/XQ6fTWac1Gg1qamqg1Wqh1+vh5eVlXebp6Qm9Xt84lV7nxce6Nfo+iIhIXjZ7UDqdDgaDwTotSRK0Wu1NlxkMhjqBRUREdLdsBlRQUBBSUlIAAJmZmQgICLAu8/f3R15eHsrLy2E2m5GWloa+ffs2XrVEROQ0bA7xhYWFITU1FZGRkRBCIC4uDtu3b4fRaERERAQWLVqEZ599FkIITJgwAe3a8dwQERHdO5sBpVarERMTU2eev7+/9edhw4Zh2LBhDV8ZERE5NV6rTUREdokBRUREdokBRUREdokBRUREdokBRUREdqnJ35teVFSE8ePHN/VuiYjIThUVFd10vkoIIZq4FiIiIps4xEdERHaJAUVERHaJAUVERHaJAUVERHaJAUVERHaJAUVERHapye+DuheSJGHFihU4deoUXF1dERsbi86dO8tdVr0cP34c77zzDjZs2IC8vDwsWrQIKpUK3bp1w2uvvQa1Wo2kpCRs3LgRWq0Wzz33HEJDQ2EymRAdHY3Lly/D09MTb775Jnx8fJCZmYk33ngDGo0GwcHBeOGFF+RuIiwWC/7+97+jqKgIZrMZzz33HLp27arIttbW1mLp0qXIzc2FRqPBqlWrIIRQZFsB4PLlyxg/fjzWr18PrVar2HaOHTvW+vLVjh07Yu7cuYps6yeffIJ9+/bBYrFg8uTJGDBggH21UziQ3bt3i4ULFwohhMjIyBBz586VuaL6WbdunRg1apSYNGmSEEKIOXPmiMOHDwshhFi2bJn47rvvRHFxsRg1apSorq4WFRUV1p/Xr18vPvzwQyGEEDt27BArV64UQggxZswYkZeXJyRJEjNnzhRZWVnyNO46ycnJIjY2VgghRGlpqRgyZIhi2/r999+LRYsWCSGEOHz4sJg7d65i22o2m8W8efPEE088Ic6cOaPYdppMJhEeHl5nnhLbevjwYTFnzhxRW1sr9Hq9+PDDD+2unQ41xJeeno6QkBAAQGBgILKysmSuqH58fX2xZs0a6/Svv/6KAQMGAAAGDx6MgwcP4sSJE+jbty9cXV3h5eUFX19fnDx5sk7bBw8ejEOHDkGv18NsNsPX1xcqlQrBwcE4dOiQLG273ogRI/A///M/1mmNRqPYtj7++ONYuXIlAODcuXNo3bq1Ytv65ptvIjIyEm3btgWg3L/fkydPoqqqCjNmzMD06dORmZmpyLYeOHAAAQEBeP755zF37lwMHTrU7trpUAGl1+uh0+ms0xqNBjU1NTJWVD/Dhw+HVvt/o6pCCKhUKgCAp6cnKisrodfrrUMLf8zX6/V15l+/7vXfxx/z5ebp6QmdTge9Xo+//e1veOmllxTbVgDQarVYuHAhVq5cieHDhyuyrZs3b4aPj4/1gAQo9+/X3d0dzz77LD799FO8/vrreOWVVxTZ1rKyMmRlZeGDDz6w23Y6VEDpdDoYDAbrtCRJdQ74jkat/r+v32AwoHnz5je00WAwwMvLq878263bvHnzpmvAbZw/fx7Tp09HeHg4Ro8erei2Ald7F7t378ayZctQXV1tna+Utm7atAkHDx5EVFQU/vOf/2DhwoUoLS21LldKOwGgS5cuGDNmDFQqFbp06QJvb29cvnzZulwpbfX29kZwcDBcXV3h5+cHNze3OmFiD+10qIAKCgpCSkoKACAzMxMBAQEyV3RvevbsiSNHjgAAUlJS0L9/fzz44INIT09HdXU1KisrkZOTg4CAAAQFBeHHH3+0rtuvXz/odDq4uLggPz8fQggcOHAA/fv3l7NJAIBLly5hxowZiI6OxsSJEwEot61bt27FJ598AgDw8PCASqVCr169FNfWL774Ap9//jk2bNiABx54AG+++SYGDx6suHYCQHJyMuLj4wEAFy9ehF6vx6BBgxTX1n79+uGnn36CEAIXL15EVVUVBg4caFftdKiHxf5xFV92djaEEIiLi4O/v7/cZdVLYWEh5s+fj6SkJOTm5mLZsmWwWCzw8/NDbGwsNBoNkpKSkJiYCCEE5syZg+HDh6OqqgoLFy5ESUkJXFxc8O6776JNmzbIzMxEXFwcamtrERwcjJdfflnuJiI2Nha7du2Cn5+fdd6SJUsQGxuruLYajUYsXrwYly5dQk1NDWbNmgV/f39F/l7/EBUVhRUrVkCtViuynWazGYsXL8a5c+egUqnwyiuvoGXLlops61tvvYUjR45ACIGXX34ZHTt2tKt2OlRAERGR83CoIT4iInIeDCgiIrJLDCgiIrJLDCgiIrJLDCgiIrJLDCgiIrJLDCgiIrJL/x9z9hPhu6wkCQAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "pd.Series(sampling_table).plot(title='Skip-Gram Sampling Probabilities')\n",
    "plt.tight_layout();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Generate target-context word pairs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:18:50.774667Z",
     "start_time": "2020-06-21T17:16:14.315608Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "64,510,238 pairs created\n"
     ]
    }
   ],
   "source": [
    "pairs, labels = skipgrams(sequence=data,\n",
    "                          vocabulary_size=vocab_size,\n",
    "                          window_size=WINDOW_SIZE,\n",
    "                          sampling_table=sampling_table,\n",
    "                          negative_samples=1.0,\n",
    "                          shuffle=True)\n",
    "\n",
    "print('{:,d} pairs created'.format(len(pairs)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:20:43.396812Z",
     "start_time": "2020-06-21T17:18:50.775526Z"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "target_word, context_word = np.array(pairs, dtype=np.int32).T\n",
    "labels = np.array(labels, dtype=np.int8)\n",
    "del pairs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:20:43.417197Z",
     "start_time": "2020-06-21T17:20:43.397696Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 6581,  1030, 13977,  5859,  7566], dtype=int32)"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "target_word[:5]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:20:43.427912Z",
     "start_time": "2020-06-21T17:20:43.417966Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>target</th>\n",
       "      <th>context</th>\n",
       "      <th>label</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>6581</td>\n",
       "      <td>244</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1030</td>\n",
       "      <td>55576</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>13977</td>\n",
       "      <td>757</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>5859</td>\n",
       "      <td>1736</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>7566</td>\n",
       "      <td>8</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   target  context  label\n",
       "0    6581      244      1\n",
       "1    1030    55576      0\n",
       "2   13977      757      1\n",
       "3    5859     1736      1\n",
       "4    7566        8      1"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame({'target': target_word[:5], \n",
    "                   'context': context_word[:5], \n",
    "                   'label': labels[:5]})\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:20:43.783870Z",
     "start_time": "2020-06-21T17:20:43.428851Z"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    32255119\n",
       "0    32255119\n",
       "dtype: int64"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.Series(labels).value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:20:44.965253Z",
     "start_time": "2020-06-21T17:20:43.784959Z"
    },
    "hide_input": false,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "with pd.HDFStore(results_path / 'data.h5') as store:\n",
    "    store.put('id_to_token', pd.Series(id_to_token))\n",
    "    store.put('pairs', pd.DataFrame({'target' : target_word,\n",
    "                                     'context': context_word, \n",
    "                                     'labels': labels}))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:20:45.277709Z",
     "start_time": "2020-06-21T17:20:44.966123Z"
    }
   },
   "outputs": [],
   "source": [
    "with pd.HDFStore(results_path / 'data.h5') as store:\n",
    "    id_to_token = store['id_to_token']\n",
    "    pairs = store['pairs']\n",
    "target_word, context_word, labels = pairs.target, pairs.context, pairs.labels"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Define Keras Model Components"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "#### Scalar Input Variables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:20:45.285907Z",
     "start_time": "2020-06-21T17:20:45.278565Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "input_target = Input((1,), name='target_input')\n",
    "input_context = Input((1,), name='context_input')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Shared Embedding Layer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:20:45.296501Z",
     "start_time": "2020-06-21T17:20:45.286720Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "embedding = Embedding(input_dim=vocab_size,\n",
    "                      output_dim=EMBEDDING_SIZE,\n",
    "                      input_length=1,\n",
    "                      name='embedding_layer')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:20:45.471671Z",
     "start_time": "2020-06-21T17:20:45.297322Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "target = embedding(input_target)\n",
    "target = Reshape((EMBEDDING_SIZE, 1), name='target_embedding')(target)\n",
    "\n",
    "context = embedding(input_context)\n",
    "context = Reshape((EMBEDDING_SIZE, 1), name='context_embedding')(context)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Create Similarity Measure"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:20:45.481466Z",
     "start_time": "2020-06-21T17:20:45.472534Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "dot_product = Dot(axes=1)([target, context])\n",
    "dot_product = Reshape((1,), name='similarity')(dot_product)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Sigmoid Output Layer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:20:45.745622Z",
     "start_time": "2020-06-21T17:20:45.482405Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "output = Dense(units=1, activation='sigmoid', name='output')(dot_product)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Compile Training Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:20:45.755472Z",
     "start_time": "2020-06-21T17:20:45.746517Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "model = Model(inputs=[input_target, input_context], outputs=output)\n",
    "model.compile(loss='binary_crossentropy', optimizer='rmsprop')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Display Architecture"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:20:45.768719Z",
     "start_time": "2020-06-21T17:20:45.756508Z"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model: \"model\"\n",
      "__________________________________________________________________________________________________\n",
      "Layer (type)                    Output Shape         Param #     Connected to                     \n",
      "==================================================================================================\n",
      "target_input (InputLayer)       [(None, 1)]          0                                            \n",
      "__________________________________________________________________________________________________\n",
      "context_input (InputLayer)      [(None, 1)]          0                                            \n",
      "__________________________________________________________________________________________________\n",
      "embedding_layer (Embedding)     (None, 1, 300)       18163500    target_input[0][0]               \n",
      "                                                                 context_input[0][0]              \n",
      "__________________________________________________________________________________________________\n",
      "target_embedding (Reshape)      (None, 300, 1)       0           embedding_layer[0][0]            \n",
      "__________________________________________________________________________________________________\n",
      "context_embedding (Reshape)     (None, 300, 1)       0           embedding_layer[1][0]            \n",
      "__________________________________________________________________________________________________\n",
      "dot (Dot)                       (None, 1, 1)         0           target_embedding[0][0]           \n",
      "                                                                 context_embedding[0][0]          \n",
      "__________________________________________________________________________________________________\n",
      "similarity (Reshape)            (None, 1)            0           dot[0][0]                        \n",
      "__________________________________________________________________________________________________\n",
      "output (Dense)                  (None, 1)            2           similarity[0][0]                 \n",
      "==================================================================================================\n",
      "Total params: 18,163,502\n",
      "Trainable params: 18,163,502\n",
      "Non-trainable params: 0\n",
      "__________________________________________________________________________________________________\n"
     ]
    }
   ],
   "source": [
    "model.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Validation Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:20:45.783680Z",
     "start_time": "2020-06-21T17:20:45.769808Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "similarity = Dot(normalize=True, \n",
    "                 axes=1, \n",
    "                 name='cosine_similarity')([target, context])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:20:45.794270Z",
     "start_time": "2020-06-21T17:20:45.784573Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "# create a secondary validation model to run our similarity checks during training\n",
    "validation_model = Model(inputs=[input_target, input_context], outputs=similarity)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:20:45.806439Z",
     "start_time": "2020-06-21T17:20:45.795291Z"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model: \"model_1\"\n",
      "__________________________________________________________________________________________________\n",
      "Layer (type)                    Output Shape         Param #     Connected to                     \n",
      "==================================================================================================\n",
      "target_input (InputLayer)       [(None, 1)]          0                                            \n",
      "__________________________________________________________________________________________________\n",
      "context_input (InputLayer)      [(None, 1)]          0                                            \n",
      "__________________________________________________________________________________________________\n",
      "embedding_layer (Embedding)     (None, 1, 300)       18163500    target_input[0][0]               \n",
      "                                                                 context_input[0][0]              \n",
      "__________________________________________________________________________________________________\n",
      "target_embedding (Reshape)      (None, 300, 1)       0           embedding_layer[0][0]            \n",
      "__________________________________________________________________________________________________\n",
      "context_embedding (Reshape)     (None, 300, 1)       0           embedding_layer[1][0]            \n",
      "__________________________________________________________________________________________________\n",
      "cosine_similarity (Dot)         (None, 1, 1)         0           target_embedding[0][0]           \n",
      "                                                                 context_embedding[0][0]          \n",
      "==================================================================================================\n",
      "Total params: 18,163,500\n",
      "Trainable params: 18,163,500\n",
      "Non-trainable params: 0\n",
      "__________________________________________________________________________________________________\n"
     ]
    }
   ],
   "source": [
    "validation_model.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "![Keras Graph](https://s3.amazonaws.com/applied-ai/images/keras_graph_tensorboard.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Create Keras Callbacks"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "####  Nearest Neighors & Analogies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:20:45.818080Z",
     "start_time": "2020-06-21T17:20:45.807290Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "test_set = analogies_id.dropna().astype(int)\n",
    "a, b, c, actual = test_set.values.T\n",
    "actual = actual.reshape(-1, 1)\n",
    "n_analogies = len(actual)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:20:45.829032Z",
     "start_time": "2020-06-21T17:20:45.820269Z"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "class EvalCallback(Callback):\n",
    "        \n",
    "    def on_train_begin(self, logs={}):\n",
    "        self.eval_nn()\n",
    "        self.test_analogies()\n",
    "\n",
    "    def on_train_end(self, logs={}):\n",
    "        self.eval_nn()\n",
    "\n",
    "    def on_epoch_end(self, batch, logs={}):\n",
    "        self.test_analogies()\n",
    "\n",
    "    @staticmethod\n",
    "    def test_analogies():\n",
    "        print('\\nAnalogy Accuracy:\\n\\t', end='')\n",
    "        embeddings = embedding.get_weights()[0]\n",
    "        target = embeddings[c] + embeddings[b] - embeddings[a]\n",
    "        neighbors = np.argsort(cdist(target, embeddings, metric='cosine'))\n",
    "        match_id = np.argwhere(neighbors == actual)[:, 1]\n",
    "        print('\\n\\t'.join(['Top {}: {:.2%}'.format(i, (match_id < i).sum() / n_analogies) for i in [1, 5, 10]]))\n",
    "\n",
    "    def eval_nn(self):\n",
    "        print('\\n{} Nearest Neighbors:'.format(NN))\n",
    "        for i in range(VALID_SET):\n",
    "            valid_id = valid_examples[i]\n",
    "            valid_word = id_to_token[valid_id]\n",
    "            similarity = self._get_similiarity(valid_id).reshape(-1)\n",
    "            nearest = (-similarity).argsort()[1:NN + 1]\n",
    "            neighbors = [id_to_token[nearest[n]] for n in range(NN)]\n",
    "            print('{}:\\t{}'.format(valid_word, ', '.join(neighbors)))            \n",
    "        \n",
    "    @staticmethod\n",
    "    def _get_similiarity(valid_word_idx):\n",
    "        target = np.full(shape=vocab_size, fill_value=valid_word_idx)\n",
    "        context = np.arange(vocab_size)\n",
    "        return validation_model.predict([target, context])\n",
    "\n",
    "\n",
    "evaluation = EvalCallback()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Tensorboard Callback"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Currently tensorflow has a [bug](https://github.com/tensorflow/tensorflow/issues/32902) that prevents metadata from working. The GitHub issue points to a simple fix that you can apply to the tensorflow source code, just search for the culprit line and change accordingly until a later release remedies this problem. You will have to install with `pip` for this purpose."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T17:20:45.955490Z",
     "start_time": "2020-06-21T17:20:45.830049Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "tensorboard = TensorBoard(log_dir=str(tb_path),\n",
    "                          write_graph=True,\n",
    "                          embeddings_freq=1,\n",
    "                          embeddings_metadata={'embedding_layer': \n",
    "                                               str(tb_path / 'meta.tsv')})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Train Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T18:20:20.282562Z",
     "start_time": "2020-06-21T17:23:24.880951Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "10 Nearest Neighbors:\n",
      "products:\tlauda, logo_pictured_pasadena_california, confluence, prevails, waltz, boca, stk, amerco, revolutionary_guard, appeals_court_upheld\n",
      "share:\trampant_corruption, uncorrelated, databases, separations, programmatic_advertising, prg, causes, purcari, stifling, glaring\n",
      "website:\tbinding_letter_intent, cherokee, amami_oshima, sprung, confirmed, barriers_entry, sanctions, replies, airtel, experience_litigating\n",
      "ebitda:\trefile, trillion_cubic_feet, sex_daniels, danger, morris_plains, volvo_trucks, duncan, nidaa_tounes, collars, rotating_presidency\n",
      "the:\tpetrofac, chronic_constipation, barnard, failed_putsch, trillium, iata, narrow_browser_and, vacations, rubbermaid, shulman\n",
      "president:\thenkel, unanimous, erectile_dysfunction, rank, irri_al_tal, energetic, arithmetic, seems, spigot, perceived\n",
      "investors:\tfavourable_variance, interacting, schulte, loomed, logility, lack_clarity, solid, simon_robinson, confirmation, antibiotic_resistance\n",
      "long:\tvegetables, america_movil, bvl, reindeer, subpoena, surely, sorry, eventually, insatiable, xlf\n",
      "increased:\tfielded, wonders, touching, convergint, poloz, unionist_party, wyoming, inclusivity, removing, robotics\n",
      "prior:\twarehouse_clubs, heats, transaction, dayrate, el_segundo, coroner, suffolk_county, vanishing, newbuilds, shenhua_energy\n",
      "\n",
      "Analogy Accuracy:\n",
      "\tTop 1: 0.00%\n",
      "\tTop 5: 0.01%\n",
      "\tTop 10: 0.02%\n",
      "25805/25805 [==============================] - ETA: 0s - loss: 0.4945\n",
      "Analogy Accuracy:\n",
      "\tTop 1: 0.45%\n",
      "\tTop 5: 3.66%\n",
      "\tTop 10: 5.63%\n",
      "25805/25805 [==============================] - 3152s 122ms/step - loss: 0.4945\n",
      "\n",
      "10 Nearest Neighbors:\n",
      "products:\tproprietary, innovative, product, specialty, focused, advanced, highly_engineered, provider, systems, development\n",
      "share:\tqtrly, common, earnings, tangible_book_value, nii, par_value, stockholders, basic, repurchased, equity\n",
      "website:\tavailable, events_presentations, pdf, web_site, webpage, accessed, webcasts, webcast, tab, profile_sedar\n",
      "ebitda:\tadjusted, ebitdax, ffo, non, income, computed_accordance, eps, purposes_calculating, determined_accordance, calculated\n",
      "the:\tst, ms, dr, learn, blackstone, atlanta, san_jose, dana, phoenix, vancouver\n",
      "president:\tsaid, omar, gonzalez, sung, antonio, chairman, adrian, olivier, eduardo, vice\n",
      "investors:\tpage, prepared_remarks, releases, report, at, download, mcdermott, filing, hartford, blackrock\n",
      "long:\thelped, small, economies, buying, up, showing, boost, shortage, tight, reach\n",
      "increased:\tdecrease, primarily, partially_offset, thousand, percentage, gross, equivalent, increase, revenues, loans\n",
      "prior:\trepurchase_program, highlights, fiscal, repurchase_authorization, reaffirms, subsidiaries, fourth, period, december, depositary\n"
     ]
    }
   ],
   "source": [
    "loss = model.fit(x=[target_word, context_word],\n",
    "                 y=labels,\n",
    "                 shuffle=True,\n",
    "                 batch_size=BATCH_SIZE,\n",
    "                 epochs=EPOCHS,\n",
    "#                  callbacks=[evaluation, tensorboard] # uncomment if tensorboard bug is fixed\n",
    "                 callbacks=[evaluation]\n",
    "                )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T18:20:25.029057Z",
     "start_time": "2020-06-21T18:20:24.762730Z"
    }
   },
   "outputs": [],
   "source": [
    "model.save(str(results_path / 'skipgram_model.h5'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Visualize Embeddings using Tensorboard"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Load the embeddings metadata using the `load` option to view embedding labels and see [tutorial](https://www.tensorflow.org/tensorboard/get_started) for usage instructions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T18:20:27.519493Z",
     "start_time": "2020-06-21T18:20:27.516263Z"
    }
   },
   "outputs": [],
   "source": [
    "%load_ext tensorboard"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-21T18:20:31.551845Z",
     "start_time": "2020-06-21T18:20:29.366869Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "      <iframe id=\"tensorboard-frame-904a6e3001b929f9\" width=\"100%\" height=\"800\" frameborder=\"0\">\n",
       "      </iframe>\n",
       "      <script>\n",
       "        (function() {\n",
       "          const frame = document.getElementById(\"tensorboard-frame-904a6e3001b929f9\");\n",
       "          const url = new URL(\"/\", window.location);\n",
       "          url.port = 6008;\n",
       "          frame.src = url;\n",
       "        })();\n",
       "      </script>\n",
       "  "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%tensorboard --logdir results/financial_news/tensorboard/train"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Resources\n",
    "\n",
    "- [Distributed representations of words and phrases and their compositionality](http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf)\n",
    "- [Efficient estimation of word representations in vector space](https://arxiv.org/pdf/1301.3781.pdf?)\n",
    "- [Sebastian Ruder's Blog](http://ruder.io/word-embeddings-1/)"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python [conda env:ml4t] *",
   "language": "python",
   "name": "conda-env-ml4t-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.7"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": true,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {
    "height": "47px",
    "left": "38px",
    "right": "1340px",
    "top": "66.5px",
    "width": "362px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
